According to Perceive Beating monitoring, the DeepSeek open-source V4 series preview version, under the MIT license, has been deployed on Hugging Face and ModelScope. The series includes two MoE models: V4-Pro with a total of 1.6T parameters and 49B (49 billion) activations per token; V4-Flash with a total of 284B (284 billion) parameters and 13B (13 billion) activations. Both models support a 1M token context.
Three architectural upgrades: a hybrid attention mechanism (Compressed Sparse Attention CSA + Heavy Compressed Attention HCA) significantly reduces the long-context overhead. Under 1M context, V4-Pro's single-token inference FLOPs are only 27% of V3.2, and the KV cache (memory consumption storing historical information during inference) is only 10% of V3.2; the manifold-constrained hyperconnection mHC replaces traditional residual connections to enhance inter-layer signal propagation stability; training has transitioned to using the Muon optimizer for accelerated convergence. The pretraining data exceeds 32T tokens.
Post-training is divided into two stages: initially, domain-specific experts are separately trained using SFT and GRPO reinforcement learning, followed by online distillation to merge them into a single model. The V4-Pro-Max (maximum inference intensity mode) claims to be the current strongest open-source model, achieving top-tier encoding benchmarks, significantly narrowing the gap in inference and agent tasks compared to closed-source cutting edge. The V4-Flash-Max exhibits inference performance close to Pro after being given sufficient deliberation budget, but is limited in pure knowledge and complex agent tasks due to parameter scale. The weights are stored in FP4+FP8 mixed precision.
