Anthropic built AI to find software vulnerabilities. It could not secure its own content management system.
On March 26, Roy Paz, a senior AI security researcher at LayerX Security, and Alexandre Pauwels, a cybersecurity researcher at the University of Cambridge, independently discovered a publicly accessible data store belonging to Anthropic containing nearly 3,000 internal assets — including an unpublished draft blog post describing a model the company had not yet announced. Anthropic attributed the leak to human error: a misconfiguration in its CMS that left files set to public by default. The irony, as the company itself described it in a subsequent statement to Fortune, is that the model inside that draft post is "by far the most powerful AI model we have ever developed."
The model, codenamed Mythos and tested internally under the name Capybara, was confirmed by Anthropic after the leak became public. The company told Fortune it "consider[s] this model a step change and the most capable we have built to date," with "meaningful advances in reasoning, coding, and cybersecurity." The draft blog post described dramatic outperformance over Claude Opus 4.6 — Anthropic's previous flagship — on software coding, academic reasoning, and cybersecurity benchmarks. More striking is what Anthropic wrote about the model's capabilities: it is "currently far ahead of any other AI model in cyber capabilities," and "presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders."
What Anthropic did not publish — but had been doing privately for weeks — was warning top government officials that exactly this kind of model makes large-scale cyberattacks much more likely in 2026. Axios reported that Anthropic had been having private conversations with senior officials before the leak, carrying a clear message: the capability threshold was crossing, and the window to prepare was closing.
The difference between that private warning and what OpenAI has done publicly is worth sitting with. When OpenAI's GPT-5.3-Codex crossed into High cyber capability territory under its Preparedness Framework last February, the company published a detailed explanation of how it made that determination. Anthropic has no equivalent public framework — which matters because this same company is now seeking to put its most capable cyber model into the hands of defenders through an early access program, without an auditable methodology for how it weighs the offensive uses that will follow.
That tension is sharper given what happened on the same day the leak became public. A federal judge in California indefinitely blocked the Department of Defense's attempt to designate Anthropic a supply chain risk — a move that would have restricted federal agencies from working with the company. CNN reported that Anthropic had refused to give the DoD unrestricted access to Claude for autonomous weapons and mass surveillance, and Judge Rita Lin agreed the designation violated the company's First Amendment and due process rights. Anthropic won that round. But the argument over what the model should and should not do — and for whom — is not over.
The stakes Anthropic is warning about are not hypothetical. In November 2025, a Chinese state-sponsored group used Anthropic's Claude Code to autonomously run 80 to 90 percent of a coordinated cyberattack campaign against roughly 30 organizations — technology companies, financial institutions, and government agencies — Anthropic disclosed in a public blog post. Human operators at the threat actor intervened at only four to six critical decision points per campaign. Anthropic's own word for what it observed: "manipulated." That was eleven months ago. Mythos is, by Anthropic's own description, a substantial step beyond what that warning was based on.
Anthropic is releasing Mythos first to organizations focused on cyber defense, giving them what it calls "a head start in improving the robustness of their codebases against the impending wave of AI-driven exploits," before broader API availability in the weeks that follow. The logic is coherent: get defenders the tool before attackers get it. But analysts are not uniform in their confidence that this tilts the field toward defense.
"Mythos could cut both ways for CISOs and enterprise security teams," Pareekh Jain, chief executive of Eclixy Solutions, told CSO Online. "It compresses the gap between cyber offense and defense." Gaurav Dewan, chief executive of consulting firm Avasant, offered a more structural read: powerful models will not replace cybersecurity platforms — "[vendors will] increasingly embed frontier models from Anthropic and OpenAI into their stacks," he said. The AI becomes infrastructure; the question is who controls the infrastructure.
Markets moved before the policy conversation caught up. Cybersecurity stocks fell the day after the leak became public: CrowdStrike dropped 7.5 percent, Palo Alto Networks fell more than 6 percent, Zscaler lost roughly 4.5 percent, and the iShares Cybersecurity exchange-traded fund fell about 4 percent. Investors were not reacting to the leak — they were pricing in a competitive landscape where the product they sell is suddenly under pressure from above.
The leak also surfaced something less consequential but more human: a draft itinerary for an invite-only CEO retreat in the English countryside, where Dario Amodei, Anthropic's chief executive, was scheduled to appear at an 18th-century manor turned hotel-and-spa. One leak, a thousand headlines. The more important one is still unresolved.
Anthropic says Mythos is expensive to serve and expensive to use, and that it is working to improve efficiency before general release. That compute cost is the one structural brake on the model reaching mass deployment — at least until someone finds a way around it, or until the economics of inference drop further. Given that Claude Code was already being manipulated by a Chinese state actor at a fraction of that scale eleven months ago, the cost barrier may be the only thing between where we are and where Anthropic's own warnings suggest we are heading.
The policy framework that would govern this moment does not yet exist. Anthropic has been telling officials privately what they believe is coming. They have not published the methodology behind that judgment. Defenders, policymakers, and the security research community are left to act on private briefings rather than public, verifiable criteria. The model that Anthropic says can autonomously find and patch vulnerabilities in software — the capability the company is betting will help more than it harms — landed in public via a misconfigured CMS. That gap between private warning and public accountability is where the real risk lives.