OpenAI shipped a major overhaul of its coding agent on Wednesday — and immediately made it impossible for European enterprises to use the feature most worth scrutinizing. The company blocked its new desktop control capability in the European Economic Area, the United Kingdom, and Switzerland at launch, according to its developer changelog. The restriction likely reflects legal uncertainty under regional privacy frameworks, not a capability gap. But the timing is worth noting: those are exactly the jurisdictions where OpenAI's self-published code-quality benchmarks might face the most skeptical review.
The new Codex adds macOS desktop control — letting the agent operate a computer the way a human would, moving files, executing terminal commands, clicking through apps — alongside parallel task agents, persistent memory across sessions, image generation via the gpt-image-1.5 model, and integrations with more than 90 plugins. The company says Codex now has 3 million weekly users, up from 2 million last month. That growth matters because OpenAI is valued at $852 billion following a $122 billion fundraising round, and that valuation presupposes the company can hold its ground in the agent infrastructure race.
The race tightened in early 2026. In blind code quality evaluations conducted this year, Anthropic's Claude Code achieved a 67% win rate against OpenAI's Codex CLI, according to Tech Insider. Claude Code has been called the tool of choice for a growing number of businesses, TechCrunch reported. The competitive pressure appears to have pushed OpenAI to redirect resources toward Codex and enterprise tools — Reuters confirmed — a signal that the company recognized it was losing ground on a product it had once considered secondary.
The benchmark figures carry a caveat worth naming: Tech Insider, the source of the 67% win rate, does not publish its evaluation methodology. Reuters's reporting on OpenAI's resource redirection is cited via Wikipedia), which traces back to the original Reuters article. Both claims are plausible and consistent with what outside observers have noted publicly. Neither meets the standard of a primary source with disclosed methodology — a gap that matters when the company being evaluated also publishes benchmarks on its own products.
The geographic restriction illustrates the problem. If Codex's desktop control capability were unambiguously superior, European enterprises — among the most aggressive adopters of AI governance frameworks — would presumably be a priority market. Instead, the feature is absent at launch. Whether this reflects legal caution, a genuine capability gap, or something else entirely is not clear from the announcement. What is clear is that the region where OpenAI chose not to launch is also the region most likely to ask hard questions about whether its benchmark claims are real.
Desktop control is genuinely new territory — no major coding agent has offered structured computer use as a shipped feature. Memory and parallel agents are more incremental improvements on existing patterns. The deeper question for enterprise buyers is whether the new Codex closes the gap with Claude Code or simply narrows it. The answer likely depends on what workloads matter most to a given team. A developer who spends most of their time in the terminal may find little reason to switch. A team deploying automated workflows across desktop environments may find the desktop control feature decisive.
What to watch: whether OpenAI expands computer use availability to European markets within the next quarter, and whether independent benchmarks — not vendor-published ones — confirm that the 67% Claude Code win rate holds after the new Codex ships its improvements.