NewsFlash Articles Data Fundraising Skill&API

Sakana AI Teams Up with NVIDIA: Allowing GPUs to Skip 80% of Useless Computations in Large Models, H100 Inference Speedup by 30%

According to Dynaction Beating monitoring, Sakana AI, in collaboration with NVIDIA, has open-sourced a sparse data format named TwELL and its accompanying acceleration kernel. This innovation enables GPUs to skip irrelevant computations that are "close to zero" when running large models. Without compromising model accuracy, this solution has boosted the H100's inference speed by up to 30% and training speed by up to 24%, while significantly reducing peak memory usage.

The feed-forward layer (FFN) of large models consumes the majority of parameters and computational power. However, in reality, during text generation, over 80% of neurons are in a "dormant state" (with activation values close to zero), making no contribution to the final output. By bypassing these neurons, significant computational power can be saved. Yet, modern GPUs are inherently skilled at computing dense matrices uniformly. Using traditional methods to extract scattered useful data incurs a heavy overhead from repeatedly searching for and reading data, nullifying the computational savings.

The TwELL format aims to break this hardware curse. It aligns perfectly with the GPU's parallel logic design: instead of piecing together non-zero data across regions as in traditional methods, it divides data into small tiles that GPUs excel at processing. As a result, each computing core of the GPU can locally pack useful data, eliminating time-consuming global memory reads and writes, seamlessly integrating into the modern chip's acceleration pipeline.

In a practical test of a 1.5 billion parameter model, by adding a slight regularization during training, the proportion of neurons that actually need computation was reduced to less than 2%, with no degradation in performance across seven downstream tasks. The data also revealed a pattern: the larger the model's parameter count, the more dormant neurons exist (the non-zero ratio of a 2 billion parameter model is 38% lower than that of a 500 million parameter model). This indicates that as future endeavors pursue larger-scale models, this hardware-centric optimization will unleash even more significant performance benefits.

Source

Correction/Report

On-Chain Activity

3h ago

The frenzy of on-chain speculation continues to heat up, with old meme coins and new algorithmic stablecoins dancing together.

OpenAI to Fully Close Fine-Tuning API: Large Models Completely Shift to Prompt, Startup Teams Lose Underlying Customization Path

Sakana AI Teams Up with NVIDIA: Allowing GPUs to Skip 80% of Useless Computations in Large Models, H100 Inference Speedup by 30%

Microsoft Open Sources Phi-Ground: 4 Billion Parameter Click-Through Rate Model Outperforms Operator and Claude

Correction/Report

Submit

Add Library

Visible to myself only

Public

Save

Choose Library

Add Library

Cancel

Finish

Sakana AI Teams Up with NVIDIA: Allowing GPUs to Skip 80% of Useless Computations in Large Models, H100 Inference Speedup by 30%

A whale FOMO bought $477,000 worth of SATO 3 days ago, currently sitting on a $130,000 unrealized gain

A whale bought $17 million worth of ETH and deposited it into the Lido staking pool

The WorldCoin team sent 30 million WLD to BitGo, worth $8.17 million

The USDC Treasury has minted an additional 250 million USDC on the Solana blockchain.