Your H100s Are Worth More Now Than When You Bought Them
Everyone assumed AI chips followed the same depreciation curve as normal hardware. Turns out, the models got better faster than the chips got older.

image from Gemini Imagen 4
Dylan Patel has a question for every hedge fund manager who passed on AI infrastructure: what if GPUs don't break?
Not metaphorically. Patel, founder of SemiAnalysis, the semiconductor and AI infrastructure research firm, has been making a quiet case in his analysis and podcast appearances that the standard bear thesis on AI compute is backwards. The Michael Burry view — that GPUs are a depreciating asset with a two-to-three-year useful life, that hyperscalers are burning capital on infrastructure that will be obsolete before it pays off — rests on a mechanical assumption about hardware cycles that doesn't account for what the hardware is actually doing.
An H100 deployed today running GPT-5.4 produces more valuable tokens than the same H100 running GPT-4 three years ago. Not just more tokens — more valuable tokens. The model improved, the software stack improved, and the economic output per chip went up. Patel calls this the appreciation thesis, and it's worth sitting with because it inverts the standard depreciation narrative entirely.
The numbers behind it are concrete. An H100 costs roughly $1.40 per hour to deploy at volume across a five-year depreciation schedule, per SemiAnalysis. At current market rates of roughly $2.20 per GPU-hour — up 10 percent in four weeks between December 2025 and January 2026 — that implies real gross margins for cloud providers. As newer models command higher token values, the same chip running better inference produces revenue that outpaces its cost trajectory. The chip appreciated.
What makes this more than a boutique financial argument is Anthropic's numbers. The company reached approximately $19 billion in annualized revenue in early 2026, with $6 billion of that added in February alone, CEO Dario Amodei confirmed at a Morgan Stanley conference. At that growth rate, the revenue trajectory implies compute requirements that compound just as fast. Anthropic needs to add roughly four gigawatts of inference capacity just to serve projected revenue growth through the end of this year — and that's before accounting for training and research compute, as Patel discussed on the Dwarkesh Patel podcast. The limiting factor isn't demand. It's access to the hardware.
That access is increasingly a function of memory.
Thirty percent of Big Tech capital expenditure in 2026 is going to memory alone, SemiAnalysis estimates. The hyperscaler CapEx envelope — widely forecast to exceed $600 billion for the Big Five in 2026, a 36 percent increase over 2025 — has a third committed to DRAM before a single GPU is powered on. This is the memory crunch Patel has been tracking for over a year: long context windows balloon the KV cache, reasoning models amplify memory bandwidth requirements, and the wafer economics of HBM are worse than commodity DRAM. HBM4E delivers roughly 2.5 terabytes per second of bandwidth per stack, compared to 64 to 128 gigabytes per second for DDR5 — roughly a 20-fold bandwidth advantage, but also roughly four times fewer bits per wafer area. To free capacity for AI, memory vendors have to destroy consumer demand with a multiplier built in.
The smartphone market is where that destruction lands. Global smartphone shipments are forecast to decline 12.9 percent year-on-year in 2026 to 1.1 billion units, the lowest annual volume in over a decade, partly because memory chip shortages are forcing vendors to ration supply and raise prices. Budget phones under $200 have already seen bill-of-materials cost increases of roughly 20 to 30 percent since early 2025 as RAM prices surged. Midrange models face increases in the mid-teens. The people getting squeezed hardest are not AI labs — they have long-term supply agreements and willingness to pay — but consumers buying $150 phones. The memory crunch has a human cost distribution.
The Alchian-Allen effect Patel identifies is the underappreciated economic consequence. When compute costs rise across the board, the relative price gap between a good model and a slightly less good model shrinks. If an H100 costs $2 per hour and a Sonnet-quality response is worth $X while an Opus-quality response is worth $2X, the calculus favors Opus more strongly at $3 per hour than at $2 per hour. Compute price inflation pushes buyers toward the best available option. In a compute-constrained market, that means the premium model captures even more of the available demand — and the compute locked up serving that demand appreciates even faster.
What the appreciation thesis ultimately implies is that the conventional model of tech capital allocation may not apply to AI infrastructure. Most hardware depreciates as it ages and as better hardware arrives. AI compute is different: the same physical chip, running better models, produces compounding economic output. A lab that signed a five-year H100 contract at $2 per hour in 2023 and is now running GPT-5.4 inference on that hardware is not running a depreciating asset. They're running an appreciating one. The depreciation math only works if the value of what the hardware produces stays flat. When the model quality curve is steep, the depreciation thesis breaks.
This is the context for why Big Tech is spending $600 billion this year on infrastructure that conventional financial analysis would call a burn pit. They're not buying hardware that will be worth less next year. They're buying hardware that, on current model improvement trajectories, will be worth more.
The question Patel gets asked — why aren't more hedge funds making the AGI trade — has a straightforward answer: the appreciation thesis requires conviction about AI progress that most institutional investors don't have and can't price. If you believe models will keep improving at the current rate, compute is a good investment. If you think the improvement curve flattens, it's a bad one. The bet is on the slope of the curve, not the current level of the technology. That's a different kind of uncertainty than most quantitative models are built to handle.
The memory crunch has another implication Patel is less explicit about but clearly believes: it creates a ceiling on how fast the industry can grow even if the economics are sound. You can sign all the five-year deals you want. If there aren't enough HBM wafers, you can't deploy the GPUs. The appreciation thesis assumes you'll eventually get the hardware. The supply chain determines whether that assumption holds.
Patel's broader framework for the decade — fast timelines favor the US and its allies, slow timelines favor China — is the kind of clean thesis that sounds obvious in retrospect and hard to act on in real time. The US and its semiconductor ecosystem are vertically integrated in ways China is not, but China is working aggressively to change that. The answer to who wins depends on how fast the race runs. And the race runs on HBM.
Correction (March 30, 2026): An earlier version of this article stated that HBM4 base delivers 1.5–1.65 terabytes per second of bandwidth per stack, citing introl.com. That figure was incorrect. The JEDEC HBM4 specification (finalized April 2025) doubles the interface width to 2,048 bits, enabling approximately 2 terabytes per second per stack — not 1.5–1.65 TB/s. The comparison to DDR5 at 64–128 GB/s (roughly a 16–32x bandwidth gap) and the broader memory crunch thesis are unaffected. The article has been updated to reflect the corrected HBM4 base figure. Dylan Patel noted HBM4E at 2.5 TB/s in the context of Rubin's memory specs — that figure was correctly stated.

