header-langage
简体中文
繁體中文
English
Tiếng Việt
한국어
日本語
ภาษาไทย
Türkçe
Scan to Download the APP

DeepSeek V4 Release: 1.6T Parameter Flagship Supports 1M Context, Inference Power Only 27% of V3.2

According to Perceive Beating monitoring, the DeepSeek open-source V4 series preview version, under the MIT license, has been deployed on Hugging Face and ModelScope. The series includes two MoE models: V4-Pro with a total of 1.6T parameters and 49B (49 billion) activations per token; V4-Flash with a total of 284B (284 billion) parameters and 13B (13 billion) activations. Both models support a 1M token context.

Three architectural upgrades: a hybrid attention mechanism (Compressed Sparse Attention CSA + Heavy Compressed Attention HCA) significantly reduces the long-context overhead. Under 1M context, V4-Pro's single-token inference FLOPs are only 27% of V3.2, and the KV cache (memory consumption storing historical information during inference) is only 10% of V3.2; the manifold-constrained hyperconnection mHC replaces traditional residual connections to enhance inter-layer signal propagation stability; training has transitioned to using the Muon optimizer for accelerated convergence. The pretraining data exceeds 32T tokens.

Post-training is divided into two stages: initially, domain-specific experts are separately trained using SFT and GRPO reinforcement learning, followed by online distillation to merge them into a single model. The V4-Pro-Max (maximum inference intensity mode) claims to be the current strongest open-source model, achieving top-tier encoding benchmarks, significantly narrowing the gap in inference and agent tasks compared to closed-source cutting edge. The V4-Flash-Max exhibits inference performance close to Pro after being given sufficient deliberation budget, but is limited in pure knowledge and complex agent tasks due to parameter scale. The weights are stored in FP4+FP8 mixed precision.

举报 Correction/Report
Correction/Report
Submit
Add Library
Visible to myself only
Public
Save
Choose Library
Add Library
Cancel
Finish