DeepSeek V4 has finally launched. This is a moment that has been awaited for nearly five months. Featuring a 1T-parameter MoE primary model + a 285B-parameter Flash version, the full 1.6T Pro version quickly followed, fully open-sourced on GitHub under the Apache 2.0 license, with weights and deployment code simultaneously released.
As soon as the model was released, the capital markets responded in three distinct yet interconnected ways.
The A-share computing power chain saw nearly across-the-board surges. Cambricon surged with an 11-day streak, rising 3.7% in a single day, with a monthly cumulative increase exceeding 60%. Huawei Technologies hit a 10% daily limit-up, closing up 8.4%. SMIC A-shares rose by 4.91%, with H-shares up by 8.81%. HSMC H-shares surged up to 18%, closing up 12%. The CSI Semiconductor Chip ETF attracted 2.4 billion yuan in a single day, reaching a historical high in scale.
The reaction in the Hong Kong stock market's large-scale model companies was of a different nature. WiseSpec (02513.HK) fell by 8.07%, with a short-selling ratio of 9.9%. MiniMax (00100.HK) dropped by 7.40%, and the short-selling ratio soared to 22.87%. The latter recorded the highest single-day short data in the AI sector of the Hong Kong stock market in the past three months. Both companies are representatives of the AI listing wave on the Hong Kong stock market in the second half of 2025, with their IPO prospectuses stating the same core competency, "self-developed large-scale model."

The reaction in the other side of the Pacific was equally specific. NVIDIA opened down 1.8% last night, dropping to -2.6% intraday, but closing flat for the day. Bloomberg's market brief compared this consolidation to the V3 "DeepSeek moment" on January 27. The difference is that the January event was a panic sell-off, evaporating $600 billion in market value in a single day. This time is more like a repricing, with a moderate scale but a clear direction. A new statement appeared in the research notes of the buying-side institutions, "Chinese AI inference demand is beginning to decouple from North American AI inference demand."
Putting these three market reactions together, it is the first verdict that the market has written within 24 hours after the landing of V4. After the victory of open-sourcing, money began to choose sides again. What can be priced is no longer just the model itself, but where the model is running and which industry chain it is placed in.
The timing window of the V4 release itself is part of the reason this reaction was amplified.
If we look back over the past 30 days, between March 26 and April 24, there have been at least 11 significantly impactful large-scale model releases or major updates globally, covering nearly all major players. The list includes Anthropic Opus 4.6, Google Gemini 3.1 Pro, OpenAI GPT-5.5, Mistral Large 3, Meta Llama 4, Dark Side of the Moon Kimi K2.6, Alibaba Qwen3-Next, Byte Bean 2.5 Pro, Tencent Huan Yuan 3.0, Kimi K2.6 Plus, and finally, DeepSeek V4 released in the early hours of April 23.
On average, a new model is being released every 2.7 days. This is a pace even fund managers can't keep up with reading the release notes. But looking at the AI asset trading charts of these 30 days, only one name stands out consistently. On April 8, GPT-5.5 drove NVIDIA to a 4.2% single-day surge, hitting a new high. Then, on April 23-24, DeepSeek V4 led the Middle Kingdom's computing power chain to a consecutive surge.

The difference is not in the model's capabilities themselves. The gap between these 11 models on the LMArena leaderboard, in most cases, does not exceed 50 points, staying within the narrow band of being in the "same tier." The difference lies in the combination of two things.
The first is open-sourcing. Among the top 10 models, only Llama 4 is open source, but Llama 4's weight protocol comes with a long list of commercial restriction clauses. The European and American developer communities gave lukewarm reviews, causing OpenRouter to drop out of the top ten on the third day of its launch. The V4 protocol is Apache 2.0, with unrestricted weight, no commercial limitations, and synchronous release of inference code. This is the flagship open-source model in the past six months that has put pressure on the closed-source camp in terms of performance, price, and openness simultaneously.
The second is timing. Against the backdrop of the closed-source camp continuously making big moves, the open-source narrative is being repeatedly squeezed. Opus 4.6 pushed the SWE-Bench of the code task to a new high, and GPT-5.5 set the price per million tokens at a sink anchor of $1.25. The debate of whether open source can catch up with closed source has been ongoing in Silicon Valley for two years. V4, estimating 90 million active users within a month, has put this debate on pause with an open-source flagship.
According to a domestic large-scale fund manager during a roadshow, "Before V4, we applied a discount to the valuation of open-source large models. After V4, this discount started to reverse."
In the V4 release announcement, there was a line that had never appeared in any official documentation of Chinese large models before: "Day 0 Full-stack adaptation of Cambricon SNN 590 and Huawei Ascend 950PR, deploy code synchronized with open source." The significance of this line requires connecting three parallel lines that have unfolded over the past 12 months to fully understand. These three parallel lines belong to hardware, software, and Silicon Valley's reaction.
The first line is on the chip side. The Huawei Ascend 950PR began mass production in December 2025, with an FP4 computing power of 1.56 PFLOPS and an HBM capacity of 112GB. It is the first time a domestically produced AI chip has directly challenged Nvidia's B series in hardware metrics. In a V4 MoE inference task with 1T parameters, the throughput of a single card is 2.87 times higher than the H20. The accompanying CANN 8.0 software stack optimizes the LLM inference framework down to the operator level. DeepSeek's publicly available benchmark shows that the end-to-end inference latency of V4 on Ascend SuperNode (8-card 950PR) is 35% lower than the equivalent H100 cluster. Cambricon SNN 590 is even more aggressive, with a single-chip FP8 computing power comparable to the H100, priced at less than half.
The second line is on the software side. The vLLM mainline merged with Cambricon MLU backend PR on April 22, making the open-source inference framework natively support non-Nvidia domestic GPUs for the first time. DeepCloud's DCU took another path through the ROCm ecosystem but was able to fully run V4's MoE routing layer. This means that V4's deployment is no longer restricted to "running only on a certain domestic GPU" but rather "can be selected among multiple domestic GPUs." The ecosystem's reliance on a single vendor has been broken, marking a key turning point for production.
The third line comes from Silicon Valley. On April 15, Jensen Huang was questioned by analysts about the progress of Chinese domestic computing power at TSMC's financial analyst day. His words were stern and specific, "If they can really free LLM from CUDA, it would be a disaster for us." Nine days later, DeepSeek provided the answer with a single Day 0 announcement.

The term "domestic substitution" has been overused to the point of losing its meaning in the past three years. However, after the morning of April 24th, for the first time, this matter had specific data that could be priced in by the capital market. Single-card throughput, end-to-end inference latency, inference cost, deployable codebase—quietly, all these factors have pushed this lengthy rhetoric battle to the threshold of production.
The logic behind Cambricon's 11 consecutive days of stock price increase lies here. It is no longer just a "domestic GPU concept stock" but a "DeepSeek V4 inference infrastructure supplier." The same logic can also explain the 12% surge in SMIC's Hong Kong stock price, as it fabricates using a 7nm equivalent process for the 950PR. Each V4 token running on domestic Ascend chips means that some of the capacity that was originally destined for NVIDIA and TSMC has been retained in the Pearl River Delta.
And the next steps have already been laid out. In Huawei's roadmap, the 950DT (training version) is planned to be delivered in the fourth quarter of 2026, with the goal of "full-stack training of a V5 or equivalent-level model on a 10,000-card cluster." If this path proves successful, CUDA's moat in large-scale model training in China will shift from "essential" to "optional."
Welcome to join the official BlockBeats community:
Telegram Subscription Group: https://t.me/theblockbeats
Telegram Discussion Group: https://t.me/BlockBeats_App
Official Twitter Account: https://twitter.com/BlockBeatsAsia