header-langage
简体中文
繁體中文
English
Tiếng Việt
한국어
日本語
ภาษาไทย
Türkçe
Scan to Download the APP

Decoding the Hand: MiniMax Releases M2 Technical Report, Elaborating on MoE Base and Agent Training System

According to Perceive Beating monitoring, following the preview of M3 sparse attention, MiniMax has submitted a 35-page technical report on the M2 series to arXiv, providing the first systematic documentation of the flagship MoE architecture design, data pipeline, and Agent-native infrastructure.


The report details the early M1 exploration of a hybrid linear attention path, the M2's decision to return to a full attention path in collaboration with GQA, and demonstrates how MTP, Sigmoid routing, and Forge address the shortcomings at both ends of inference and training. It also discloses for the first time the ultra-long context Agent RL training system Forge and M2.7's self-evolution mechanism. In particular, Forge introduces windowed FIFO scheduling and prefix tree merging technology, achieving up to a 40x training acceleration in long-sequence Agent reinforcement learning.


Furthermore, M2.7 demonstrates for the first time an intelligent agent's "self-evolution" closed loop, capable of independently completing over 100 rounds of analysis, code modification, evaluation runs, and rollback processes, resulting in a 30% improvement in internal evaluation performance.


In terms of performance, the M2.7, with a single token activating 9.8B parameters, achieves 56.22% on SWE-Pro, 52.7% on Multi-SWE-bench, and an average podium rate of 66.6% on MLE Bench, approaching Gemini 3.1.

举报 Correction/Report
Correction/Report
Submit
Add Library
Visible to myself only
Public
Save
Choose Library
Add Library
Cancel
Finish