When Google released Gemma 4 last week, most coverage led with the benchmarks. The 31B model scores 89.2% on the AIME 2026 math competition benchmark. That leaps from the 20.8% Gemma 3 managed. The 26B mixture-of-experts variant activating just 3.8 billion parameters during inference yet ranking sixth among all open models on Arena AI. Numbers that justify a headline.
But the more consequential detail surfaced in a separate disclosure, confirmed to Ars Technica directly by Google: the next generation of Gemini Nano, the on-device AI that runs inside Pixel phones and across Android devices without touching the cloud, will be built on Gemma 4, specifically the E2B and E4B variants now available for download.
That is the story. Not the benchmark.
Google is running a two-track strategy that it has not quite stated this explicitly before. One track is open: Gemma 4 weights you can download from Hugging Face or Kaggle today, run on your own hardware under an Apache 2.0 license, fine-tune with your own data, deploy behind your own firewall. The other track is proprietary: Nano 4, shipping later in 2026, the AI that handles call screening, summarization, scam detection, and whatever Google invents next for the pocket form factor.
The two tracks share a common foundation. Every open-source improvement to Gemma 4 flows into Nano 4. Every optimization Google makes for edge deployment in the Gemmaverse finds its way back into the model family. Developers building with E2B today are, in effect, prototyping for hardware that will ship to hundreds of millions of consumers within the year.
This is not altruism. It is infrastructure positioning.
Google watched Meta build an ecosystem around Llama. The permissive license generated 400 million downloads of earlier Gemma generations, but the custom terms and unilateral modification rights in Google's license made enterprise and sovereign deployments legally uncomfortable. Developers wanted the weights but did not trust the strings attached. With Gemma 4, Google switched to Apache 2.0. The commercial use restrictions are gone. The acceptable-use policy that Google could update at any time is gone. What remains is an open license that legal teams do not need to escalate.
The Nvidia partnership tightens the grip further. NVIDIA published day-zero optimization guides for Gemma 4 across its entire product line on the same day the models launched. Blackwell data center GPUs, Jetson edge modules, consumer GeForce RTX cards. NIM microservices offer prepackaged inference containers for self-hosted enterprise deployment, while the NeMo library handles fine-tuning directly from Hugging Face checkpoints without model conversion, as Forbes reported. The message to any organization considering building on open weights is simple: the path from download to production is shortest on Nvidia hardware, and Google has pre-negotiated that path.
For Android developers, the stakes are more immediate. The AICore Developer Preview launched alongside Gemma 4, and Google confirmed that systems designed with E2B and E4B today will be forward-compatible with Nano 4 at launch. Gemini Nano 4 will use the same model family as its foundation: E2B runs at three times the speed of E4B on the same hardware, optimized for latency-sensitive tasks like real-time transcription or on-screen assistant responses. E4B prioritizes reasoning depth, the tradeoff made explicit in the naming.
The context windows tell you where Google drew the line. Edge models top out at 128,000 tokens, large models at 256,000. That is sufficient for processing a legal contract or a code repository in a single prompt, but it trails Llama 4 Scout's 10-million-token context and Qwen's one-million-token offering. Google is not competing on raw context length. It is competing on the intersection of openness, hardware optimization, and the distribution channel that only Google controls: the Android ecosystem.
The competitive logic is coherent. Developers who build with Gemma 4 become contributors to a model family that Google then commercializes in a form factor those developers cannot easily replicate. The open-source community trains the model, finds its edge cases, publishes efficiency techniques, and releases fine-tunes. Google integrates the best of that work into Nano and ships it to a billion devices. The Gemmaverse of over 100,000 registered model variants is free labor that improves Google's proprietary product.
Whether that constitutes exploitation or symbiosis depends on your position in the stack. For founders building AI products, the practical takeaway is the same either way: Gemma 4 works, the license is clean, the hardware ecosystem is mature, and the development tools are production-ready today. For investors assessing Google's AI strategy, the takeaway is different: the company has found a way to be both open and proprietary simultaneously, and the open part is funding the proprietary part at scale.
The benchmarks are impressive. The numbers justify the coverage. But the benchmark story is what everyone else is writing. The Nano story is what Google is actually executing.
Gemma 4 is not a product. It is the beginning of a supply chain.