header-langage
简体中文
繁體中文
English
Tiếng Việt
한국어
日本語
ภาษาไทย
Türkçe
Scan to Download the APP

MiniMax to Open Source Blackwell's Exclusive Attention Library, M3 Weight Expected to be Released This Friday

According to PerfX Beating monitoring, Ryan Lee, Developer Relations Lead of MiniMax, announced that the high-performance attention library MiniMax Sparse Attention (MSA) for the NVIDIA Blackwell (SM100) GPU has been officially open-sourced under the MIT license. Ryan Lee also mentioned that the MiniMax-M3 weights are expected to be released this Friday.

MSA has been applied to the million-scale context reasoning of MiniMax-M3, filtering the most relevant KV Block in each GQA group and performing attention calculation only on the selected blocks. The paper shows that with a context of 1 million tokens, compared to a Dense GQA with the same configuration, MSA can reduce attention computation by 28.4 times and achieve 14.2 times prefill acceleration and 7.6 times decoding acceleration on the H800 GPU.

The open-source version integrates both C++ JIT and CuTe-DSL implementations in the same Python package, while providing Dense FlashAttention and Sparse Top-k Attention Kernel, supporting various precision formats such as BF16, FP8, NVFP4, and FP4. Currently, it is mainly deployed for the NVIDIA Blackwell (SM100) GPU.

举报 Correction/Report
Correction/Report
Submit
Add Library
Visible to myself only
Public
Save
Choose Library
Add Library
Cancel
Finish