According to Dynamic Insight Beating's monitoring, prior to the release of DeepSeek V4, a widely circulated speculation in the community was that the delayed launch of V4 was due to difficulties in adapting the model from NVIDIA to Huawei Ascend platform. Although the V4 technical report did not directly address this rumor, the performance data disclosed clearly contradicts it.
The report indicates that V4's Fine-Grained Expert Partitioning (EP) Scheme has been deployed and validated on both NVIDIA GPUs and Huawei Ascend NPUs, accelerating regular inference workloads by 1.50 to 1.73 times, with latency-sensitive scenarios such as RL rollout and high-speed Agent services achieving a maximum acceleration of 1.96 times. The team has open-sourced the CUDA version kernel MegaMoE as part of DeepGEMM. In other words, V4 has achieved close to the theoretical efficiency limit on both sets of hardware, and cross-platform adaptation has not resulted in any performance loss.
