NewsFlash Articles Data Fundraising Skill&API

Yifan Zhang Reveals Full Technical Specs of DeepSeek V4: 1.6T Parameters, 384 Expert Activations, 6 Cores

According to Vision One Beating monitoring, Princeton Ph.D. student Yifan Zhang updated the technical details of DeepSeek V4 on X. He teased "V4 next week" on April 19 and listed three architecture component names, tonight revealing a complete parameter table, while also disclosing for the first time the existence of a lightweight version V4-Lite with 285B parameters.

V4 has a total of 1.6T parameters. The attention mechanism is DSA2, combining two sparse attention schemes used by DeepSeek previously in V3.2, DSA (DeepSeek Sparse Attention), and NSA (Native Sparse Attention) proposed in a paper earlier this year, with head-dim 512, complemented by Sparse MQA and SWA (Sliding Window Attention). The MoE layer consists of 384 experts, with 6 active at a time, using the Fused MoE Mega-Kernel. Hyper-Connections are employed for residual connections.

Details revealed for the first time on the training side include: the optimizer uses Muon (a matrix-level optimizer that applies Newton-Schulz orthogonalization to momentum updates), pre-training context length of 32K, reinforcement learning phase using GRPO with KL divergence correction. The final context length is expanded to 1M. The modality is pure text.

Zhang is not affiliated with DeepSeek, and DeepSeek officials have not responded to the above information.

Source

Correction/Report

On-Chain Activity

5h ago

Aave Crisis Update: $70 million ETH Recovered, Founder Indicates Progress in Implementing Multi-Solution Risk Mitigation

Iran: Armed Forces on High Alert, Ready to Take Decisive Counteraction

Earnings Report No Longer Important? Tesla Q1 Preview: Market Betting on Musk's "Future Narrative" to Sustain Valuation

The Nasdaq Composite Index once again hit a new intraday all-time high; Cryptocurrency-related stocks surged.

Correction/Report

Submit

Add Library

Visible to myself only

Public

Save

Choose Library

Add Library

Cancel

Finish

Yifan Zhang Reveals Full Technical Specs of DeepSeek V4: 1.6T Parameters, 384 Expert Activations, 6 Cores

A certain HYPE short-selling whale has liquidated their position, with the exit price at $41.01.

An address has shorted $2.34 million worth of CHIP in the past 3 hours.

Seven whales collectively shorted on Hyperliquid, with the liquidation average price at $81,502.

The Ondo team has transferred $34 million worth of tokens to a new wallet, potentially for selling.