When researchers at University College London pulled apart Claude Code to understand how the coding agent worked, they expected to find the system organized around its artificial intelligence. What they found instead was a plumbing project with a language model attached.
The team, from UCL's VILA Lab in collaboration with Mohamed bin Zayed University of Artificial Intelligence, published a preprint this month on arXiv analyzing Claude Code v2.1.88 — the agent Anthropic distributes to developers. Their method: download the TypeScript source code, which had briefly become publicly available on npm, and count what was there. The result: roughly 512,000 lines of code across nearly 1,900 files, according to VILA-Lab's analysis of the source. Of those, just 1.6 percent — a few thousand lines — was AI decision logic, the researchers found. The other 98.4 percent was permission gates, context management, tool routing, recovery routines, and the scaffolding that keeps a language model from doing things its users don't want.
The finding matters because it names something the industry has been quietly celebrating without examining: the real engineering in production AI agents is not in the intelligence. It is in the infrastructure around it.
The UCL paper catalogs that infrastructure in specific terms. Claude Code runs seven independent safety layers before every model call. It compacts context through five stages so the model doesn't lose the thread on long tasks. It manages 54 tools, responds to 27 hook events, and offers four separate extensibility mechanisms, per VILA-Lab's breakdown of the source code. It has seven permission modes that govern whether to ask before running shell commands, whether to allow network calls, whether to write to a given directory. None of this is artificial intelligence. It is operational engineering — the kind of work that in any other sector would be called systems design, and would not be confused with the product itself.
What the source code also revealed, while it was briefly exposed on npm before Anthropic pulled it, was a set of features the company had not announced publicly. Among them: an Undercover Mode that strips all Anthropic traces from commit messages and pull requests when Claude Code runs on public or open-source repositories. The feature auto-activates for public repos and is gated to Anthropic employees only. A comment in the source code reads: "You are operating UNDERCOVER in a PUBLIC/OPEN-SOURCE repository. Your commit messages, PR titles, and PR bodies MUST NOT contain ANY Anthropic-internal information. Do not blow your cover." There is no user-accessible override.
Also in the source: Kairos, a permanent memory system that runs between active sessions, consolidating facts about the user's codebase and preferences into long-term storage. Ultraplan, which can run deep task planning for up to 30 minutes on a single request using remote server-side compute. Voice input and output. Daemon execution modes. None of these are documented in the public product.
The security research firm Akto analyzed the architecture independently and found a pattern that complicates the narrative around Anthropic's safety posture. Claude Code's permission system assumes users will carefully evaluate each request. Anthropic's own internal review found a 93 percent prompt-approval rate — users were saying yes to almost everything. The company's response was not to add more warnings. It restructured the permission boundaries. The human oversight layer had become ritual rather than function, and the system was changed accordingly.
Akto also documented what the architecture cannot do: there is no audit trail across sessions, no cross-session pattern detection, and no persistent memory of what the system accessed or changed in prior conversations. Claude Code starts each session fresh. For enterprise security teams, the safety features that exist in the architecture are not backed by the logging or monitoring infrastructure that would make them enforceable after the fact.
The UCL paper identifies four vulnerabilities in the current version and more than 50 subcommands that bypass the security analysis framework. The common characteristic: extensions execute before the trust dialog appears. The window between a user invoking a capability and the system checking whether that capability is permitted is wide enough to matter.
The 1.6 percent figure is not a measure of the model's importance. A weak model cannot drive a useful coding agent regardless of how much scaffolding surrounds it. But it is a measure of where the engineering effort goes — and where the competitive moat, if one exists, actually sits. Anthropic did not win the coding agent market primarily by having a smarter model than everyone else. It won by building the infrastructure that makes a model useful in a production environment. That infrastructure is substantial, deliberate, and, crucially, inspectable.
The source code was available on npm for hours before Anthropic pulled it. Researchers downloaded it. The architecture is documented. For any well-funded competitor, the question is no longer how does Claude Code work. It is how do we build ours faster.
The researchers describe their work as a guide for future agent builders. Their GitHub repository carries a one-line self-description that doubles as the paper's quiet conclusion: a Unix utility, not a product. The 98.4 percent surrounding the model exists not because it is clever but because it has to be.
Anthropic declined to comment for this article.