header-langage
简体中文
繁體中文
English
Tiếng Việt
한국어
日本語
ภาษาไทย
Türkçe
Scan to Download the APP

MiniMax M3 has been officially open-sourced, with native multimodal support for millions of contexts.

According to Dynamic Vision Beating monitoring, the domestic large-scale model manufacturer MiniMax has officially open-sourced the native multimodal ensemble expert (MoE) model MiniMax M3 weights on Hugging Face. The MiniMax M3 has a total parameter count of 428 billion, with each token activation requiring 230 billion parameters, natively supporting a 1 million token super-long context. To reduce deployment GPU memory overhead, the development team simultaneously released the MXFP8 quantized version and adapted it for mainstream inference frameworks such as SGLang, vLLM, and Transformers.

In terms of multimodal design, MiniMax M3 conducts joint training of text, image, and video during the pre-training phase to achieve native semantic fusion, instead of performing multimodal alignment in the post-training phase. In its operational mechanism, the model provides a dual reasoning mode, consisting of a Thinking mode for complex logic and tool orchestration, and a Non-thinking mode for low-latency dialogue and code generation.

Powering the underlying core for a million token super-long context is the concomitantly open-sourced lightweight attention core library MiniMax Sparse Attention (referred to as MSA). Official data indicates that MSA employs Grouped Query Attention (GQA) chunked retrieval mechanism. In real-world testing with a 1 million token extremely long context, the MSA operator optimized for the NVIDIA Blackwell (SM100) architecture achieves over 9 times prefilling acceleration and 15 times decoding speedup compared to traditional full attention mechanisms, while significantly reducing inference overhead.

举报 Correction/Report
Correction/Report
Submit
Add Library
Visible to myself only
Public
Save
Choose Library
Add Library
Cancel
Finish