Hello, and welcome to another issue of China Chatbot!
The theme this week is “AI agents,” a vaguely-defined, potentially game-changing upgrade to large language models (LLMs) towards which multiple tech start-ups are racing with mixed results. The government is eyeing them as well. A research lab in Beijing is training AI agents for social governance and control in urban areas across the country — and Wuhan is already getting started.
Enjoy!
Alex Colville (Researcher, China Media Project)
_IN_OUR_FEEDS(2):
Senior AI Scientist Warns Against China AI Hype
At the Zhongguancun AI Forum on March 29, Zhu Songchun (朱松纯), dean of Beijing Institute for General AI, warned that China’s AI ecosystem suffers from deep “cognitive biases” and a distorted public narrative. He said that “decision makers, institutions and media” had been tasked with learning about AI in a very short period of time, inevitably leading to some distortions. He warned that the field was “exciting on the surface, but chaotic when it comes to substance” (表面热闹,实质混乱). He claimed, for example, that a group of leading companies creating large language models (LLMs), known collectively as the “Six Little Dragons of LLM” (大模型小六龙) were overvalued — and that staff at the AI centers springing up in universities across the country were under-qualified. Zhu stressed the need for China to avoid concentrating on LLMs and focus instead on foundational questions around the intelligence and architecture that underpins AI. While outlets like Caixin and The Paper with a track record of slightly more professional coverage reported Zhu’s comments, state news agency Xinhua and Party flagship newspaper the People’s Daily ignored them in their coverage of the forum (despite the former sending four journalists).

AI Crowd Control Simulator Released
Also at the Zhongguancun forum on March 29, the Beijing Institute for General AI (GENAI), a research and development non-profit tied to the elite Peking University, released what it claimed was the nation’s first China's first "Large Social Simulator" (大型社会模拟器). The simulator, which appears to have been launched as a prototype late last year, has been tested in a National Intelligent Social Governance Experimental Base (国家智能社会治理实验基地) in Wuhan, on an area of the city 500 square kilometers across. In a Chinese context, “social governance” (社会治理) refers to the management of the relationship between the Party and the people, and so is also closely associated with monitoring and surveillance. According to reports from the Institute, the simulation will be used to develop and train “social-level intelligent agents” (社会级智能体) (see _EXPLAINER below for what an “AI agent” is) on the behaviour of groups and crowds for purposes of “social governance,” “population policy” (人口政策) and “emergency management” (应急管理). The model collected data from the testing area on a variety of topics, including “enterprise characteristics, population structure, consumer behavior, and social and economic conditions,” the team reporting its input boosted traffic flow in the area and decreased queue lengths. The institute plans to create an updated version this year, which will also model the “complex social relationship networks” of government and enterprises. The institute also claimed the technology simulations of the past and present could be used to “deduce the future.”
TL;DR: Chinese state media is not great for getting the ground truths on how Chinese AI is doing. A Chinese AI scientist worries that despite the success of DeepSeek, Chinese tech is still fixated on copying the US, and believes other LLM start-ups have been blown out of the water by DeepSeek. Meanwhile, Wuhan is making serious inroads into augmenting AI for urban governance and social control policies, possibly as a test case before being rolled out across the country.
_EXPLAINER:
AI Agents (人工智能代理 / 智能体)
“The name’s Seek. DeepSeek.”
I’m sure someone found that funny. But no, AI agents have nothing to do with espionage.
Then what are they?
The thing Manus, a Chinese LLM that went viral last month, claims to be. But it’s a really fuzzy term, based more on aspirational dreams than solid reality: what exactly an AI agent does depends on who you ask. But Xinhua News Agency gives an interesting idea of what sets it apart from ordinary LLMs:
“For example, if you give the task 'buy coffee', an LLM will tell you 'I can't buy coffee for you directly' and give other suggestions; an AI agent will first break down how to buy coffee, and plan the steps of placing an order and paying using a certain app. According to these steps, it will call the app to select takeout, and then call the payment program to place an order and pay, without the user needing to specify each step.”
So in a nutshell, AI agents are active LLMs, armed with long-term memory, tech tools and an internet connection. In a perfect world they are a little assistant, doing tasks for you either online or through the internet of things like a netizen would, breaking down one single prompt into bite-size steps.
Got an example?
We’ll stick to media-related ones. So as all developers are aware, writing out computer code can be brain-breaking and fiddly. But recently I tested a new coding agent called Cline from a Californian company. It built an entire website for me in ten minutes from just one prompt. Terrifyingly, Cline wrote and stored the code on files in my own personal laptop. I had essentially given a bot, created by people I don’t know, access to my personal files. That’s certainly a security issue.
You have to be very clear exactly what you want from Cline, but this tech is spooky to watch, running through task after task by itself, remotely operating your internet browser or writing entire files of code which it saves directly to your computer:
But where does China fit into this?
Because movers in the field have known for a while the industry is on the cusp of creating “general AI agents” for ordinary users (not just coders), which would be a level-up on the LLMs we have right now. A People’s Daily report from January 2024 pitched it as a way to revolutionalize productivity for small companies. In November last year, Robin Li, CEO of Baidu, said they will become “the new carrier of content, information, and services in the AI era.” If an LLM could do tasks like buy our favorite morning coffee, why would ChatGPT remain the go-to LLM?
Wait, so ChatGPT could have a big rival soon?
Bizarre to think, I know, but that’s how fast the field is moving now. OpenAI has rolled out an AI agent of its own, but their tech is still limited to doing research for now.
But the pay-offs for getting agents right would be huge. They could be a very credible rival to an LLM that only gathers information. So an AI agent is both a way for Chinese AI companies to get an edge over rivals, and for China to gain that “lead-goose effect” (头雁效应) Xi Jinping wants from AI.
Have any Chinese AI agents been rolled out yet?
Besides Manus? Not many. AI start-up Zhipu AI (智普AI) launched its own open-source AI agent, “AutoGLM Deepthink” (AutoGLM沉思) on March 31. Since early last year Baidu and Chinese state media have claimed the company’s Ernie Bot (文心一言) can act as an AI agent, but the only tasks they list are the same as normal LLMs.
So there’s a bit of hype?
Yes, as with most novelties in AI everywhere. It also doesn’t help that no-one can agree what exactly an LLM must be able to do to be called an “AI agent,” which some tech companies and Party propaganda capitalize on for publicity. For example, that article from People’s Daily Online giving our definition for AI agents is also the article that labels Ernie Bot as an AI agent, even though Ernie Bot was never ever going to buy you coffee. AutoGLM Deepthink also still needs work (see this week’s _ONE_PROMPT_PROMPT).
AI agents are still tricky to get right. Tests on Manus by TechCrunch and MIT Tech Review show it still needs improvements before it hits one-prompt perfection, and Manus’s gallery of use cases never strays beyond research, writing and web design. At a talk last month, Manus co-founder Tao Zhang said their AI agent is still limited in the tools and information it has access to, and is frequently hindered by anti-bot software. So it’s still a work in progress for now.
_ONE_PROMPT_PROMPT:
Zhipu AI (智普AI) is one of the “large language model six little dragons” Zhu Songchun claims are over-valued due to media hype (see _IN_OUR_FEEDS). We can see this in action with the release, and media reception, of their newest product.
On March 31, the company released AutoGLM Deepthink, billed by their CEO as the world’s first AI agent to combine complex reasoning with the ability to perform practical tasks. Basically that means it could do the sorts of tasks Claude does, and also conducting in-depth research on the internet like ChatGPT (combining the two areas into one model is challenging).
AutoGLM Deepthink featured in positive coverage from multiple Chinese media outlets. Xinhua grandiosely hailed it as an agent that “can enable AI to jump out of the dialog box and do work for humans.”
But none of these outlets bothered to test the model, or cross-reference the CEO’s claims. AutoGLM Deepthink is not the only AI agent out there that integrates research with tasks — Zhipu is clearly working with competition with Manus in mind (see this week’s _EXPLAINER). AutoGLM Deepthink is also not without its drawbacks.
That isn’t to say the model is bad. It’s very good at research tasks. I asked it to create a line graph tracking the cumulative costs of Trump’s tariffs to the US stock market, day by day. Its conclusion was well-sourced, and a big step up from a previous version I gave the same challenge. That version concluded it was $240 million in total. (If only the US were so lucky).
However this latest version seemed to have trouble combining research with tasks that Claude, the AI chatbot built by San Francisco-based Anthropic, manages with ease. Its attempt, for example, to turn the Trump tariffs into a line graph failed.
Another immediate problem is the machine unaccountably fails when it comes to even medium-term memory. When I asked the model to find one-bedroom apartments in Taipei under 30,000 Taiwan dollars — I welcome human input (anyone?) — its thinking process showed it was sorting through ideal areas of the city with what seemed to be accurate estimates. It judged them well by what I know of rent rates for different districts. But the answer it spit out in the end was back on Trump tariffs.
AutoGLM Deepthink’s limits are very easy to reach. When I asked it multiple questions from a special evaluation benchmark designed for AI agents, it performed well on research-based ones. It struggled, however, with more logical problem solving. With an admittedly complex question about a Rubik’s Cube, I had it stuck in a thought loop for a really, really, really long time.
Eventually, it spun off into an answer about coding scripts that had nothing to do with the initial question. So yeah, I broke it.
AI agents are a work in progress, so Zhipu AI’s attempt shouldn’t be knocked too hard. But I have another question that takes us back to the hype and commercial competition I spoke about earlier. If AI agents are the game-changer some have made them out to be, they should have lucrative commercial potential. So why has Zhipu AI released an unlimited version of AutoGLM Deepthink for free?