header-langage
简体中文
繁體中文
English
Tiếng Việt
한국어
日本語
ภาษาไทย
Türkçe
Scan to Download the APP

「Huawei Chip Delays DeepSeek V4 Launch」? Same Core Slows Down NVIDIA Ascend While Accelerating Nearly 2x

According to Dynamic Insight Beating's monitoring, prior to the release of DeepSeek V4, a widely circulated speculation in the community was that the delayed launch of V4 was due to difficulties in adapting the model from NVIDIA to Huawei Ascend platform. Although the V4 technical report did not directly address this rumor, the performance data disclosed clearly contradicts it.

The report indicates that V4's Fine-Grained Expert Partitioning (EP) Scheme has been deployed and validated on both NVIDIA GPUs and Huawei Ascend NPUs, accelerating regular inference workloads by 1.50 to 1.73 times, with latency-sensitive scenarios such as RL rollout and high-speed Agent services achieving a maximum acceleration of 1.96 times. The team has open-sourced the CUDA version kernel MegaMoE as part of DeepGEMM. In other words, V4 has achieved close to the theoretical efficiency limit on both sets of hardware, and cross-platform adaptation has not resulted in any performance loss.

举报 Correction/Report
Correction/Report
Submit
Add Library
Visible to myself only
Public
Save
Choose Library
Add Library
Cancel
Finish