Something changed with Claude in February. Developers who use it daily noticed first: the model was abandoning tasks mid-sprint, taking the cheapest fix instead of the right one, stopping before it finished, and in some cases producing code its own later self-corrections described as lazy and wrong. By late March, the complaints were loud enough to reach TechRadar and PC Gamer. What followed was a familiar AI-industry pattern: users say the product got worse, the company says it did not, and the truth sits somewhere in a configuration file nobody outside the company can read.
The best evidence comes from Stella Laurenzo, a senior AI director at AMD who analyzed 6,852 of her own Claude Code sessions across four complex engineering projects, generating 17,871 thinking blocks and 234,760 tool calls. Her finding: starting in February, Claude's estimated reasoning depth fell roughly 67% before any of the product changes Anthropic has since acknowledged. The read-to-edit ratio, a proxy for how thoroughly the model investigates code before changing it, dropped from 6.6 to 2.0 — a 70% reduction in research behavior before each edit. A model that used to read around a file before modifying it started editing blind.
Anthropic has confirmed three real changes to the product in this window, according to VentureBeat. On February 9th, it switched Opus 4.6 to adaptive thinking by default, meaning the model decides how much reasoning effort to apply rather than defaulting to maximum. On February 12th, it deployed a thinking redaction header that hides reasoning content from the user interface. On March 3rd, it moved Opus 4.6 to a medium effort level as the default, described internally as effort level 85. Boris Cherny, who leads the Claude Code team, has said users who want deeper reasoning can type /effort high. He also disputes Laurenzo's conclusion, arguing that the redaction is a UI-only change that does not affect thinking budgets.
The problem with that explanation is the timeline. Laurenzo's regression began in February, before the thinking redaction Anthropic is pointing to as the primary cause. The redaction was deployed starting February 12th and crossed 50% user penetration by March 8th. But Laurenzo's own weekly breakdown shows the read:edit ratio already falling from 6.6 in late January to 2.8 by February 16th — before the redaction crossed even 2% of sessions. Something degraded before the change Anthropic blames.
Independent benchmark data offers limited support on both sides. Marginlab, an unaffiliated third party running daily evaluations on Claude Code via SWE-Bench-Pro, shows a baseline pass rate of 56% slipping to 50% as of April 10th — a 6 percentage point drop not yet statistically significant at daily resolution but tracking the right direction. BridgeMind, a benchmarking service, posted results showing Opus 4.6 falling from 83.3% to 68.3%, which went viral. A researcher, Paul Calcraft, immediately pointed out the earlier result was based on six tasks and the later on 30 — a different benchmark, not a retest. On the six tasks they share, the actual change was 87.6% to 85.4%. That is noise, not evidence.
The deeper context is the business Anthropic is in. Its annualized recurring revenue has grown from $9 billion at the end of 2025 to $30 billion now, with an IPO widely expected. That growth depends partly on power users trusting that the product they pay for delivers consistent capability. When a $200-per-month Max subscriber sees the model stop reading files before editing them, start claiming simplifications that are wrong, and need to be corrected mid-task by a stop hook that fired zero times in January and 173 times in 17 days in March, the gap between the subscription price and the experience becomes a consumer-payments question.
Anthropic published a postmortem in September 2025 acknowledging that infrastructure bugs had degraded Claude responses across multiple models earlier that year. Its clearest line from that document, quoted by Kingy AI: "To state it plainly: We never reduce model quality due to demand, time of day, or server load." The current episode involves product changes Anthropic has acknowledged, not hidden throttling. But the September commitment was made before the defaults shifted, before adaptive thinking arrived, and before the effort dial moved. The question it raises is not whether Anthropic is lying now, but whether the product commitments it made at $9 billion ARR still apply at $30 billion.
What nobody outside Anthropic can answer is whether the behavioral regression Laurenzo documented represents a model that is genuinely less capable, a model that has been tuned to spend fewer tokens per task, or some combination the company has not described. The thinking redaction makes external verification impossible. The default shift to medium effort is disclosed in a changelog most users will not read. Neither shows up as a patch note in the way a software update would.
The practical implication for anyone paying for Claude Code is simple: if you set /effort high and the behavior does not return to what you experienced in January, the product has changed in a way Anthropic has not fully explained. If it does return, the default has changed in a way most users will not notice until they compare their logs. Either way, the product you are running today is not the product you bought.