header-langage
简体中文
繁體中文
English
Tiếng Việt
한국어
日本語
ภาษาไทย
Türkçe
Scan to Download the APP

Apple Researcher Performs Transformer Attention 'Surgery': Makes Every Token Forget Itself for Better Performance

According to 1M AI News monitoring, Apple's machine learning research scientist Shuangfei Zhai has published a paper proposing "Exclusive Self-Attention." The change is simple: in the standard Transformer, each token includes its own information when computing attention; XSA forcibly excludes the contribution from its own position, extracting information only from the context. Intuitively, a token already knows itself, and the value of the attention mechanism is to inform it about its surroundings.

Experimental results consistently outperform standard self-attention within a maximum 27 billion parameter scale, with the advantage becoming more prominent as the sequence length increases. Zhai was also the author of the Attention Free Transformer (AFT) and continues to explore alternative solutions to the attention mechanism.

举报 Correction/Report
Correction/Report
Submit
Add Library
Visible to myself only
Public
Save
Choose Library
Add Library
Cancel
Finish