According to Recon AI monitoring, Cambricon announced that it had completed the adaptation of two models, 285B DeepSeek-V4-Flash and 1.6T DeepSeek-V4-Pro, on the day of the V4 release. The adaptation is based on the vLLM inference framework, and the adaptation code has been open-sourced on GitHub.
The adaptation speed depends on two prerequisites: first, Cambricon's self-developed NeuWare software stack natively supports PyTorch, vLLM, and other mainstream frameworks, enabling quick model migration; second, Cambricon chips natively support mainstream low-precision data formats, allowing for precision verification without the need for additional format conversion. For the new architecture of V4, Cambricon has accelerated modules such as Compressor and mHC through its self-developed fusion operator library Torch-MLU-Ops, and has written sparse/compressed Attention, GroupGemm, and other hot operator kernels using BangC.
At the inference framework level, Cambricon supports five-dimensional mixed parallelism in vLLM (TP/PP/SP/DP/EP), communication-computation parallelism, low-precision quantization, and PD separation deployment. The V4 technical report only mentioned validation on NVIDIA GPUs and Huawei Ascend NPUs, without involving the Cambricon platform. This adaptation was independently completed by Cambricon. Stimulated by the V4 release news, the A-share domestic chip sector strengthened, with Cambricon experiencing a straight-line surge during trading hours.
