Anthropic has told government officials that its not-yet-released Claude Mythos model makes large-scale cyberattacks substantially more likely in 2026, and that the system is currently far ahead of any other AI model in cyber capabilities, according to a draft blog post the company has not disputed. Those are extraordinary claims. They are also claims nobody outside Anthropic can evaluate — because the company has not published the evaluation criteria, benchmarks, or specific capability demonstrations that would allow independent researchers to assess whether the self-assessment is accurate. Fortune
The timing is awkward. Anthropic confirmed Mythos exists on March 26 after a configuration error in its content management system exposed roughly 3,000 internal files, including the draft blog post describing the model as "by far the most powerful AI we've ever developed" and a step change from its current most capable system, Claude Opus 4.6. Days later, an employee uploaded original source code to NPM instead of compiled code, exposing approximately 500,000 lines of Claude Code across 1,900 files — including KAIROS, an unreleased autonomous agent mode with nightly memory distillation and scheduled refresh cycles that Anthropic had not announced. Alex Kim Competitors can now reverse-engineer the agentic harness that drives Claude Code's autonomous task execution. Fortune Anthropic attributed both incidents to human error in release packaging.
Two significant data failures in the same week is worth noting on its own. The question is what Anthropic is doing with the capability information it now knows is partially outside its control.
Anthropic has a better public record on transparency than most frontier labs. In November 2025, it published a detailed technical breakdown of a Chinese state-sponsored campaign in which AI performed 80 to 90 percent of the campaign's tactical operations independently — infiltrating roughly 30 organizations including defense contractors, financial institutions, and government agencies. At peak attack, the AI made thousands of requests per second. Anthropic That level of operational disclosure is unusual. It covered a real incident, with specifics that could be verified by affected parties and independent researchers.
The September campaign demonstrated that AI-assisted cyberattacks at this scale are feasible using capabilities that were already public. Mythos is the successor — a model Anthropic says is substantially more capable, now the subject of private warnings to officials who cannot inspect it.
The company says it plans to release Mythos first to cybersecurity defenders before broader rollout, a phrase that sounds responsible. But defenders using the model are working from the same unverified self-assessment. They cannot independently confirm whether the capabilities exceed what Anthropic has described. And the officials receiving private warnings are in the same position: they are hearing from the entity best placed to assess the model's capabilities, with the strongest commercial incentive to be seen as powerful, and the primary source of warnings about the risks that power creates.
Labs have argued that private engagement with government is necessary because public disclosure of specific thresholds would help adversaries more than defenders. There is a real argument there. But it produces a structural outcome: Anthropic occupies a position where it is simultaneously the evaluator, the warned party, and the warner. That concentration of roles is the governance problem.
What to watch is whether Anthropic publishes the evaluation methodology behind its cyber capability claims. The company's track record suggests it understands this obligation. If the private warnings stay private, the accountability gap grows in proportion to the model's actual capabilities — and those capabilities are now, however inadvertently, a matter of public record.