A paper posted to arXiv on Jan. 18, 2026 proposes a different way to think about how many AI agents a system can run at once. The key idea is deceptively simple: don't keep all your agents awake.
Holos, a system described in a 23-author paper from Shanghai Jiao Tong University, the Shanghai Innovation Institute, and Oxford, introduces what it calls the Nuwa engine. Agents in Nuwa exist as dormant records in a relational database, each indexed by a 32-bit hash. When a task arrives, the system instantiates only the agents it needs, runs them, then demobilizes them. Population size and active compute become separate variables. A system designed this way could claim it supports a million agents not because it runs a million simultaneously, but because it can call up any subset from persistent storage on demand.
The paper calls this the L5 tier of a five-level taxonomy for multi-agent systems. L1 is a single agent. L5 means fluid emergence across more than 10^6 agents, with gene-based heterogeneity and continuous operation. Whether any of this has been demonstrated at that scale is a different question from whether the architecture is designed for it.
What makes Holos structurally distinct is its Orchestrator. Rather than a central scheduler assigning tasks, Holos uses a market mechanism. Agents submit proposals for tasks in a blind auction, in the way a System 1 and System 2 thinker might compete in Kahneman's framework. The winner is selected based on a scoring function the paper does not fully specify. Whether that produces better outcomes than a well-tuned scheduler is a question the paper acknowledges but does not resolve with data.
The paper's third major claim is endogenous value alignment, meaning values that emerge from the system's own operation rather than being specified from outside. The argument leans on a No Free Lunch theorem to claim that without external constraints, an agent population will tend toward homogenization collapse. The market mechanism and the S-MMR diversity algorithm are proposed as fixes. S-MMR adds controlled noise to semantic matching to prevent all agents from converging on the same tools. Whether this is a genuine solution to a real alignment problem or an elaborate description of a task queue with a diversity hyperparameter is not a question the paper answers cleanly.
The tool ecosystem Holos targets is MCPZoo, a repository the paper says provides more than 8,000 standardized tool servers. If the architecture scales and MCPZoo matures, tool diversity becomes the real constraint, not agent count. That framing is coherent. It also makes MCPZoo the load-bearing dependency of the whole system, which the paper treats as a feature rather than a risk.
The intellectual lineage the paper cites is Kevin Kelly and The Inevitable, where Kelly described Holos as a planetary-scale collective intelligence emerging from the interconnection of humans, machines, and their environment. Corresponding authors Yuanjian Zhou at the Shanghai Innovation Institute and Weinan Zhang at SJTU's School of Computer Science are gesturing toward that scale. Zhang is a professor whose research covers reinforcement learning, agentic AI, and embodied AI.
The open-source release is announced on holosai.io. The website displays a Coming Soon page with no repository link and no released artifacts. The SJTU-SAI-Agents GitHub organization hosts implementations of related projects including X-Master and ML-Master, but nothing under the Holos or Nuwa name. A paper that names its own open-source home and has no shipped code is a familiar category: the announcement-to-implementation gap. That gap does not make the architecture uninteresting. It means the claims should be held at arm's length until something actually runs.
What the paper does provide is a coherent architectural argument for why population scale and compute cost do not have to grow together, a market-based alternative to central orchestration, and a taxonomy that gives the L5 goal a name. Those are worth taking seriously as design propositions. Whether they survive contact with a real workload is a different story, and that story has not been written yet.