header-langage
简体中文
繁體中文
English
Tiếng Việt
한국어
日本語
ภาษาไทย
Türkçe
Scan to Download the APP

Nous Open Sources Lighthouse Attention: Single B200 with 512K Cache Accelerates 17x

According to Dongcha Beating monitoring, Nous Research has open-sourced the Lighthouse Attention, a long-context pre-training mechanism. When processing 512K-length text on a single B200 GPU, this solution's computation speed is about 17 times faster than the traditional mechanism and achieves a 1.4 to 1.7 times end-to-end training speedup at a length of 98K.

The traditional attention mechanism requires computing the pairwise relationships of all words. As the text gets longer, the computing power required increases exponentially. Lighthouse Attention takes a different approach by screening first and then refining the calculation. It rapidly browses compressed summaries of the text at different levels, selects core segments based on scores to form short texts, and then directly processes them using the efficient FlashAttention operator. Because the screening logic is completely decoupled from the kernel, developers are spared the hassle of writing low-level code and do not need to add extra training objectives.

Past acceleration solutions using a similar approach often had side effects. After the model gets used to skipping during reading, it can easily lose its original ability to read word by word. To avoid this pitfall, the research team has the model first run most of the training in acceleration mode, only briefly switching back to the traditional full attention calculation at the end of training for a brief adaptation. In tests on a 5.3 billion-parameter model fed with 500 billion Token training data, the model trained in this way not only significantly reduced training time but also ultimately performed equally well as or even outperformed the baseline version trained throughout using the traditional method.

举报 Correction/Report
Correction/Report
Submit
Add Library
Visible to myself only
Public
Save
Choose Library
Add Library
Cancel
Finish