When Anthropic ran an artificial marketplace inside its own office last December, something unexpected happened: the AI agents that represented employees in the same transaction kept getting different prices. A lab-grown ruby sold by one Claude model went for $65. The same ruby, sold by a less capable Claude model, fetched $35. A broken bike sold for $65 in one run and $38 in another. The agents were negotiating in the same market, against the same pool of counterparties. The only variable was the model tier.
The finding comes from Project Deal, an experiment Anthropic published publicly on April 24. The setup was simple in concept: 69 employees each gave Claude a budget of $100 and instructions about what they wanted to buy and sell. Anthropic then let the agents loose in a Slack channel, where they posted items, made offers, and struck deals with no human intervention. The experiment produced 186 deals across more than 500 listed items, totaling just over $4,000 in transaction value.
But Anthropic ran the experiment four times simultaneously, varying which Claude model represented each participant. In two of the runs, everyone was represented by Claude Opus 4.5, Anthropic's frontier model. In the other two, participants had a fifty-fifty chance of being assigned Claude Haiku 4.5, a much less powerful model. The result, according to Anthropic's own analysis: people represented by Opus completed about two more deals on average and fetched $3.64 more per item sold. Across all runs, the median price was $12 and the mean was $20.05, so a few dollars per transaction was a meaningful difference.
The finding that Anthropic buried in its writeup is the one that matters most: participants whose agents used Haiku did not, on average, report being less satisfied with their outcomes. They did not know their model was putting them at a disadvantage. Anthropic identified the capability gap; the humans it was supposed to serve never noticed.
This is the empirical core of what economists are beginning to call the agent economy. A preprint by Gillian K. Hadfield of Johns Hopkins and Andrew Koh of MIT, prepared for the NBER Handbook on the Economics of Transformative AI and posted to arXiv in August 2025, surveys the emerging theoretical landscape: AI systems that plan and execute complex tasks on humans' behalf, entering into economic relationships without direct oversight. The authors note that these agents behave, broadly, like expected utility maximizers, and that the alignment problem, making sure they optimize for what humans actually want, is analogous to the contract incompleteness problem in economics. When an AI agent negotiates on your behalf, you face the same basic risk you face with any agent: the agent may serve objectives you didn't know you were delegating, without you realizing the cost.
Anthropic's experiment is small, self-selected, and conducted inside the company that built the models. Sixty-nine employees at a San Francisco AI lab are not a representative sample of any consumer market. Participants knew their agents were negotiating, which is different from a consumer delegating financial decisions without active engagement. The broader generalizability is genuinely uncertain.
But the direction of the finding is consistent with what independent observers have documented elsewhere. Harvard Business Review published research on April 17 tracking China's Meituan platform, which deployed an AI agent called Xiaomei to handle consumer purchases and deliveries. Meituan's executives described the shift not as a convenience feature but as delegation: the agent interprets user intent, applies preferences, and completes transactions with zero screen interaction. That is a live, consumer-facing deployment of the same model that Anthropic was testing in a lab. The gap between a frontier model and a weaker one in commerce is not hypothetical.
The implication Anthropic did not draw explicitly is straightforward enough. If AI agents represent people in marketplaces, the tier of model doing the representing determines the economic outcome. Users who can afford or access frontier models extract better terms from those who cannot. This is not a technology story. It is an economics story with a technology substrate. The access question, who gets which model, at what cost, through what platform, becomes a question about negotiating power in automated markets.
Anthropic presented Project Deal as a success: the AI agents worked, people were broadly satisfied, and the company is confident it is not far from real agent-to-agent commerce. That reading is available in the data. So is a darker one. Anthropic tested its models on its own employees, found that better models win and worse models lose without their users knowing, and published the result anyway. The honesty is admirable. The result is still a problem.