When an AI lab with a hot product decides to ration it, something interesting is happening beneath the press release.
OpenAI has spent the past two months passing on business opportunities it would normally take, according to president Greg Brockman in an April 24 interview. The reason, in his words: "There is not going to be enough compute in the world to meet the demand" — compute being the raw processing capacity of the specialized chips AI systems run on. He was not being modest. He was describing a structural constraint that is now shaping which products get built and which customers get served.
OpenAI discontinued its Sora video generation app, citing compute constraints. Anthropic, which competes with OpenAI for the same limited pool of GPU capacity, confirmed publicly that compute is a constraint across the entire industry, and is rolling out its newest model, Mythos, only to select large firms rather than across its customer base. These are not abstract infrastructure questions. They are product decisions made under scarcity, and they are landing in real time.
OpenAI charges $5 per million input tokens and $30 per million output tokens for its newest model via API — double the rate of its predecessor. OpenAI serves roughly 900 million consumers and more than 1 million businesses. When demand runs ahead of supply at any price, raising prices is demand management, not premium positioning. Jensen Huang, CEO of Nvidia, called GPT-5.5 a "huge achievement" and evidence that AI systems can now do real work rather than just answer questions, in an internal email to staff. OpenAI has committed to deploying more than 10 gigawatts of Nvidia systems for its next-generation infrastructure — a scale that would have seemed implausible three years ago, and one that gives OpenAI a position in the allocation queue most competitors cannot match.
Inside OpenAI, the scarcity has been a fact of life for years. "Every team has people whose productivity is directly tied to how much compute they have," Brockman said. The most contested conversations are about allocation: which research program gets priority this quarter, which product ships, which engineers wait for a cluster to free up.
OpenAI's own tools are making the problem worse. More than 85 percent of OpenAI employees use Codex, its AI coding tool, every week across software engineering, finance, and product management. AI coding tools that make developers two to three times more productive also generate two to three times more inference requests from the systems those developers built. Every efficiency gain in the development pipeline adds load to an inference pipeline that is already overstretched.
What changes next is measured in years, not quarters. TSMC is expanding chip production. Samsung, SK Hynix, and Micron are ramping high-bandwidth memory output. New GPU capacity comes online on a construction schedule that does not respond to demand signals. In the meantime, the allocation decisions being made today are reshaping the competitive landscape. Companies with long-term hardware supply agreements are shipping products. Companies without them are waiting in line — or making the trade OpenAI made with Sora: build something else, or build nothing at all.