NewsFlash Articles Data Fundraising Skill&API

Google has open-sourced the text diffusion model DiffusionGemma: over a thousand tokens per second on a single GPU, with a 4x speedup.

According to 动察 Beating monitoring, Google has released the experimental open-source large model DiffusionGemma, which adopts a novel text generation mechanism based on diffusion, breaking the constraints of traditional large language models that generate word by word. DiffusionGemma has a total of 26 billion parameters, with only 3.8 billion parameters activated in each forward pass under a mixture of experts (MoE) architecture. It achieves up to a 4x speedup in local GPU inference by parallelizing the generation of entire blocks of text.

Unlike the traditional "typewriter-style" word-by-word generation, DiffusionGemma operates similarly to image generation, first generating random placeholders on a canvas and then iteratively erasing noise and locking in the correct text over multiple time steps. Each forward pass can parallelize the generation of 256 tokens, enabling bidirectional attention interaction for all tokens. The bidirectional attention mechanism demonstrates significant advantages in non-linear generation tasks such as code completion, in-line editing, and mathematical formula generation. However, the overall output quality of DiffusionGemma is currently lower than that of the standard Gemma 4.

In terms of hardware testing and inference speed performance, a single NVIDIA H100 GPU can achieve a generation speed of over 1000 tokens per second, while a consumer-grade NVIDIA GeForce RTX 5090 GPU can surpass 700 tokens. After 4-bit floating-point (NVFP4) quantization, the inference VRAM usage can be reduced to within 18GB, significantly lowering the barrier for local deployment.

DiffusionGemma's weights have been open-sourced on Hugging Face and have received support from mainstream development tools such as MLX, vLLM, Unsloth, and NVIDIA NeMo.

Source

Correction/Report

On-Chain Activity

26min ago

The U.S. government transfers approximately $216,000 in assets seized from the FTX case, involving LINK, AAVE, and others

24H Important News

2026-06-11

JPMorgan Chase: May CPI May Have Peaked, Fed May Stay "On Hold," But Market Still Betting on Rate Hike by Year-End

「Trump Phone」 Disassembled: Highly Similar to HTC Model, 'Made in America' Claim Questioned

Binance Wallet Launches SPCX x IPO Event, Opens SpaceX Tokenized Stock Purchase

Whale Continues to Increase Bitcoin Holdings, Withdraws Over 3000 BTC in the Past 5 Days from CEX and Custodian

Correction/Report

Submit

Add Library

Visible to myself only

Public

Save

Choose Library

Add Library

Cancel

Finish

Google has open-sourced the text diffusion model DiffusionGemma: over a thousand tokens per second on a single GPU, with a 4x speedup.

Whale Continues to Increase Bitcoin Holdings, Withdraws Over 3000 BTC in the Past 5 Days from CEX and Custodian

Spot and Futures Linkage Sparks Controversy, VELVET Token Faces Significant Selling Pressure Following Short-Term Surge

xAI Cofounder Igor Babuschkin Launches River AI to Build User-Controlled Personalized AI Agents

The U.S. government transfers approximately $216,000 in assets seized from the FTX case, involving LINK, AAVE, and others