DeepSeek dropped its long-anticipated V4 model family on Thursday, releasing two open-source models under MIT license that claim the top coding benchmark scores among all models tested. Buried in the technical report is the most explicit public statement any frontier AI lab has made about Chinese hardware as an inference roadmap.
The larger model, V4-Pro, has 1.6 trillion total parameters with 49 billion activated per token using a mixture-of-experts architecture. The smaller V4-Flash has 284 billion parameters with 13 billion activated. Both support a one-million-token context window and are available for download on HuggingFace.
The architecture story is real. DeepSeek built what they call a hybrid attention mechanism combining Compressed Sparse Attention and Heavily Compressed Attention. According to the official model card, at the full one-million-token context length, V4-Pro requires only 27 percent of the inference FLOPs that DeepSeek-V3.2 needed for the same task, and its KV cache shrinks to 10 percent of V3.2 levels. That is not a rounding-error improvement. Long-context efficiency is where the economics of running these models live.
The training used the Muon optimizer over a dataset of more than 32 trillion tokens. Post-training ran a two-stage pipeline: domain-specific experts trained separately through supervised fine-tuning and reinforcement learning with GRPO, then consolidated into a single model through on-policy distillation.
On coding, V4 leads
In the company's own benchmark table (comparing against Anthropic's Opus-4.6, OpenAI's GPT-5.4, Google's Gemini-3.1-Pro, and Moonshot's K2.6), V4-Pro-Max scores 93.5 percent on LiveCodeBench, ahead of Gemini-3.1-Pro High at 91.7 percent and K2.6 Thinking at 89.6 percent. On Codeforces competitive programming rating, V4-Pro-Max hits 3206, compared to GPT-5.4 xHigh at 3168 and Gemini-3.1-Pro at 3052.
On agentic software engineering (SWE-bench Verified), the lead evaporates. Opus-4.6 Max scores 80.8 percent, Gemini-3.1-Pro and V4-Pro-Max both land at 80.6 percent. On the harder reasoning tasks, Humanities Last Exam and Apex, V4-Pro-Max trails Gemini and GPT-5.4. The story is specifically about coding, not reasoning broadly.
These numbers come from DeepSeek's own evaluation. The model card does not name an independent evaluator.
The hardware disclosure
This is the part the wire missed. The South China Morning Post, which broke the story, reported that DeepSeek said V4-Pro's throughput is currently limited by a shortfall in computational supply, and that prices will drop significantly in the second half of 2026 once Huawei's Ascend 950PR super nodes ship at scale.
Read that again. DeepSeek is not saying they hope Nvidia supply loosens. They are saying the pricing roadmap for their flagship model runs through Huawei hardware shipping in volume. A frontier lab is telling developers and deployers that the cheapest path to running V4 is Chinese silicon, on a timeline that depends on Huawei's manufacturing schedule.
DeepSeek did not disclose what hardware they used to train V4 itself. The company has stayed silent on training hardware despite US officials accusing DeepSeek of using banned Nvidia Blackwell chips, which would violate export controls. The Information separately reported that the model was optimized to run on Huawei's Ascend 950PR. SCMP notes that the technical report mentions the development of GPU kernels adapted to both Nvidia and Huawei chips: consistent with a dual-hardware engineering approach, but not a resolution of the training question.
The contradiction is unresolved: US officials say banned Nvidia chips were used in training; DeepSeek's published pricing roadmap assumes Huawei hardware for inference. The company has said nothing to reconcile these.
Jensen Huang, Nvidia's chief executive, called the prospect of DeepSeek's models running on Huawei chips a "horrible outcome" for the United States, according to The Next Web. He was commenting before V4 shipped. The pricing statement in V4's technical report is the answer to the question he was raising.
What it means for the labs
The open-source coding leader position is meaningful. V4-Flash, the smaller cheaper model, already runs at Codeforces rating 3052 in its Max reasoning mode, competitive with Gemini. Developers building coding agents now have a freely available, MIT-licensed option that matches or leads closed-source alternatives on the benchmarks they care about most.
The gap that remains is on reasoning tasks at the frontier. On Apex, a hard multi-step reasoning benchmark, GPT-5.4 scores 54.1 percent and Gemini-3.1-Pro 60.9 percent. V4-Pro-Max scores 38.3 percent. The labs have not been caught on everything.
The hardware story has a longer tail. If Huawei delivers Ascend 950PR super nodes in volume in H2 2026 as DeepSeek implies, it will be the first time a Chinese chip has been publicly cited as the primary inference substrate for a frontier model. US export controls prevented DeepSeek from accessing Nvidia's H100 and Blackwell chips; the result appears to have been that DeepSeek optimized around them rather than around them being absent.