Meta is already running Nvidia Grace CPUs at scale and has Vera in its roadmap. So why is it now buying AWS Graviton5?
That's the question the company's multi-year Graviton deal raises — and the answer reveals something the GPU-orthodoxy narrative has been missing about where agentic AI actually needs compute.
The conventional wisdom of the past few years held that GPUs were the only compute that mattered for AI. CPU was infrastructure housekeeping — orchestration overhead, some preprocessing, the unsexy plumbing that surrounded the real work. That story is being quietly rewritten.
Agentic AI workloads don't just run inference on GPUs. Real-time reasoning engines, orchestration loops, code generation, long-horizon memory management, and vector search operations are all CPU-intensive. The orchestration layer that coordinates multi-step AI tasks can consume 30–40% of total compute in agentic systems, according to an infrastructure engineer at a major AI lab who spoke with type0 on background because they weren't authorized to comment publicly. A single agentic workflow might make hundreds of API calls, manage state across distributed memory systems, and run inference chains that don't parallelize well on GPU tensors.
"GPUs are for the heavy lifting," the engineer said. "But the coordination overhead — that's all CPU. And as workflows get more complex, that overhead scales superlinearly."
Intel CEO Lip-Bu Tan confirmed in February that CPU demand for AI workloads is currently exceeding supply — a striking admission from a company whose GPU story is behind. "For the last few years, the story around high performance computing was almost exclusively about GPU and other accelerators," he said in Intel's Q1 2026 earnings call. Arm-based CPUs are projected to hold 90% of the AI ASIC server CPU market by 2029, per Counterpoint Research. The CPU comeback isn't theoretical.
Why three architectures?
Grace, Vera, and Graviton serve different roles in Meta's stack. Nvidia Grace — already deployed at scale — and Vera — in Meta's roadmap — are Nvidia's ARM-based server chips. Grace handles GPU-to-CPU communication with low latency, making it the natural home for GPU-adjacent orchestration in systems like Llama clusters. Vera, Nvidia's next-generation CPU, is expected to ship with updated NVLink and PCIe generations, targeting a different power/performance envelope than Grace.
Graviton5 — AWS's homegrown ARM chip with 192 cores, 5x the cache of its predecessor, and 33% faster core-to-core communication — is the hyperscale workhorse. The 25% per-core performance improvement over Graviton4 matters less than the core count: when you need to run thousands of lightweight orchestration threads simultaneously, you want many cores at moderate clock speed, not few cores at high frequency. AWS claims Graviton uses 60% less energy than comparable x86 instances, a meaningful variable at the scale Meta is procuring.
The strategic logic appears to be compartmentalization. Graviton handles the commodity orchestration plane — the load balancing, the request routing, the background memory sweeps. Grace stays close to the GPU fabrics where NVLink's bandwidth advantage is irreplaceable. Vera, when it arrives, slots into whatever new GPU-to-CPU interconnect architecture Nvidia ships next.
This is not elegance. It's hedging across three different technology curves.
The math that doesn't add up
Here is where the architecture story gets uncomfortable: Meta is simultaneously cutting jobs and investing in compute at a scale that requires scrutiny.
The same week Meta announced the Graviton5 deal, it confirmed layoffs affecting roughly 8,000 workers. The company simultaneously signed infrastructure deals worth a combined $48 billion with CoreWeave and Nebius — GPU cloud rental contracts spanning multiple years, not annualized capex. Those two deals alone total $48B in rental obligations; Meta has separately guided its 2025 AI capex at $60–65 billion, with year-to-date spend already past $48B by Q3 2025. The scale of the infrastructure commitments is real. The comparison to the week's layoffs reads differently depending on which number you start from.
Building for three CPU architectures means three toolchains, three optimization passes, three kernel variants, three security update cadences. The operational complexity tax is real. Meta's infrastructure team has historically been large enough to absorb it; the layoffs suggest that calculus is being reconsidered.
What's driving the compute surge isn't mystery — it's agents. Multi-step AI systems that maintain context, call tools, and loop through reasoning chains require sustained CPU availability in ways that batch inference workloads simply don't. The GPU gets the headline. The CPU is the glue that holds the workflow together. And when you have thousands of concurrent agents running, the glue becomes the bottleneck.
No public benchmark compares Graviton vs. Grace vs. Vera performance on agentic workloads. Meta hasn't published that data; neither has Nvidia or AWS. The decision to run three architectures simultaneously is, in the absence of that data, an implicit bet that diversification beats single-vendor optimization — or that no single architecture is good enough for every layer of the stack.
That bet reflects a genuine uncertainty in the industry about what agentic AI actually needs from hardware. The frameworks are new. The workload profiles are still being characterized. Meta is building infrastructure for a compute pattern the industry hasn't fully mapped yet.
The Graviton deal is not a story about AWS and Meta. It's a data point in the industry's collective guess about where agentic compute is heading — and an admission that nobody is certain enough to commit to just one answer.
Meta declined to comment beyond its public announcement. AWS, Nvidia, and Intel did not respond to requests for comment by publication.