A human takes 20 hours to breach a corporate network. Anthropic's Mythos model can do it faster and cheaper — not because it is cleverer, but because it can spend more. The UK's AI Safety Institute ran the model against a 32-step simulated attack and found performance scaled linearly with compute all the way to a 100 million token budget, with no ceiling and no diminishing returns. More tokens, better results. That is the proof-of-work argument at its most stripped-back.
Anthropic released Mythos on April 7 with Project Glasswing, a consortium backing the model with up to $100 million in usage credits to find and patch vulnerabilities at scale. The framing was a race: get capable models into defenders' hands before attackers acquire the same tools. AISI's results supported the case — Mythos completed the 32-step attack in three out of ten attempts, finished an average of 22 steps in the others, and solved 73 percent of expert-level CTF tasks that no model could touch before April 2025.
But the data Anthropic published also contains a result that cuts against its own narrative.
AISLE, an independent AI security company that has been running active vulnerability discovery against live targets since mid-2025, took the specific exploits Anthropic highlighted in its announcement, isolated the relevant code, and ran them against publicly available models. Eight out of eight detected the FreeBSD vulnerability Anthropic called its flagship result, including a model with 3.6 billion active parameters priced at $0.11 per million tokens. A 5.1 billion active open model recovered the full attack chain for a 27-year-old OpenBSD vulnerability. On a basic false-positive discrimination task, small open models outperformed most frontier models from every major lab. The capability rankings reshuffled completely from task to task.
The finding, in AISLE's words: the moat in AI cybersecurity is the system, not the model.
Since mid-2025, AISLE has filed 15 CVEs in OpenSSL, including 12 out of 12 in a single security release, with bugs dating back more than 25 years and a CVSS severity of 9.8 critical. The OpenSSL team called the quality of reports and collaborative remediation constructive. AISLE runs on models that are not Anthropic's, because the strongest performer varies by task. There is no stable best model across cybersecurity tasks. The capability frontier is jagged.
Drew Breunig, whose analysis of the AISI report first framed this as a proof-of-work problem, put the economics plainly: to harden a system you need to spend more tokens discovering exploits than attackers will spend exploiting them. That is not a metaphor. AISI's own results confirm it — performance scales with token budget, no ceiling observed up to 100M tokens.
The implications reach past the security industry. Open source libraries become more valuable in this environment — tokens spent securing a widely-used open source project are shared across every downstream user. That runs counter to the conventional wisdom that AI makes it cheaper to replace open source with vibe-coded alternatives. It also means the organizations with the largest token budgets have a structural security advantage that compounds over time.
What to watch next: whether Anthropic's Glasswing consortium actually closes the gap between its announcement and competitors' demonstrated capabilities, and whether the economics of compute-bounded security create a durable moat for well-funded defenders — or just raise the floor for what counts as a minimum viable security posture.