Something that looks like DeepSeek V4 went live this week at a price that makes AI developers do a double-take. The cost to process a million tokens of input: roughly $0.30. For output: about $0.50. The same operations on GPT-5.4 run $2.50 and $15.00 respectively — V4 is twenty to thirty times cheaper, according to Jangwook.net, a site that tracks AI API costs. DeepSeek has not announced a new model. Its own API documentation lists only V3.2. But that price, for a model not yet officially released, is the signal. DeepSeek has also closed its first outside financing round — Futu News reported last week that founder Liang Wenfeng secured his first external funding.
The reason the price is possible is what DeepSeek built underneath. DeepSeek V4 runs entirely on Huawei chips — no NVIDIA silicon anywhere in the stack. The processor, the Ascend 950PR, outperforms NVIDIA's export-controlled H20 chip by up to 2.8x per card, per benchmarks reviewed by Weijin Research, attributed to Huawei's own Ascend computing president. US export controls were designed to prevent exactly this outcome. Jensen Huang, NVIDIA's CEO, described it in a recent podcast: "That is a horrible outcome for our nation," per The Next Web.
The CUDA-to-CANN migration is the engineering story. CUDA (short for Compute Unified Device Architecture) is NVIDIA's programming framework, the layer that lets AI models run on NVIDIA chips. CANN (Neural Network Architecture) is Huawei's equivalent. DeepSeek spent months rewriting its core code to train AI models on Huawei's architecture instead of NVIDIA's, per Reuters. NVIDIA built the H20 because Washington banned the sale of its fastest chips to China — a deliberately weakened version that stayed under the restriction threshold. Huawei's answer is faster on raw specs and fabricated by SMIC on a 5-nanometer-class process, per TrendForce. Major Chinese tech companies are treating it as a production alternative to NVIDIA: Alibaba, ByteDance, and Tencent have collectively ordered hundreds of thousands of Huawei's upcoming Ascend 910C chips, per The Information. Huawei plans to produce roughly 600,000 of those chips in 2026, doubling its 2025 output, per TrendForce.
DeepSeek V3 was trained on 2,048 NVIDIA H800 GPUs, a chip NVIDIA sold to China before the export ban of 2023, per SCMP. V4 required starting over. The rewrite worked enough to run V4. Scaling it for mass traffic is the next test.
Whether Huawei hardware holds under sustained production traffic is the open question. The benchmark figures are DeepSeek's own, released alongside the model, with no independent third-party replication published. The follow-on R2 model hit repeated training failures on the same silicon — the adaptation works for V4, but scaling to harder problems remains unproven.