According to Perceive Beating monitoring, the DeepSeek open-source V4 series preview version, under the MIT license, has been integrated into Hugging Face and ModelScope. The series includes two MoE models: V4-Pro with a total of 1.6 trillion parameters and 49 billion activations per token; V4-Flash with a total of 284 billion parameters and 13 billion activations. Both models support a context size of 1 million tokens.
Three architecture upgrades: a hybrid attention mechanism (Compressed Sparse Attention CSA + Heavy Compression Attention HCA) significantly reduces the long-context overhead. Under 1 million contexts, V4-Pro's single-token inference FLOPs are only 27% of V3.2, and its KV cache (memory consumption for storing historical information during inference) is only 10% of V3.2; a Manifold-Constrained HyperConnection mHC replaces traditional residual connections to enhance cross-layer signal propagation stability; training has shifted to using the Muon optimizer for accelerated convergence. The pre-training data exceeds 32 trillion tokens.
Post-training consists of two stages: first, using SFT and GRPO reinforcement learning to separately train domain experts, then using online distillation to merge them into a single model. The V4-Pro-Max (Maximum Inference Intensity Mode) claims to be the current strongest open-source model, reaching top-tier encoding benchmarks, with a significantly reduced performance gap in inference and agent tasks compared to closed-source cutting edge models. V4-Flash-Max shows inference performance close to Pro when given sufficient deliberation budget, but is limited in pure knowledge and complex agent tasks due to parameter size constraints. Weights are stored using FP4+FP8 mixed precision.
