header-langage
简体中文
繁體中文
English
Tiếng Việt
한국어
日本語
ภาษาไทย
Türkçe
Scan to Download the APP

Google has open-sourced the text diffusion model DiffusionGemma: over a thousand tokens per second on a single GPU, with a 4x speedup.

According to 动察 Beating monitoring, Google has released the experimental open-source large model DiffusionGemma, which adopts a novel text generation mechanism based on diffusion, breaking the constraints of traditional large language models that generate word by word. DiffusionGemma has a total of 26 billion parameters, with only 3.8 billion parameters activated in each forward pass under a mixture of experts (MoE) architecture. It achieves up to a 4x speedup in local GPU inference by parallelizing the generation of entire blocks of text.

Unlike the traditional "typewriter-style" word-by-word generation, DiffusionGemma operates similarly to image generation, first generating random placeholders on a canvas and then iteratively erasing noise and locking in the correct text over multiple time steps. Each forward pass can parallelize the generation of 256 tokens, enabling bidirectional attention interaction for all tokens. The bidirectional attention mechanism demonstrates significant advantages in non-linear generation tasks such as code completion, in-line editing, and mathematical formula generation. However, the overall output quality of DiffusionGemma is currently lower than that of the standard Gemma 4.

In terms of hardware testing and inference speed performance, a single NVIDIA H100 GPU can achieve a generation speed of over 1000 tokens per second, while a consumer-grade NVIDIA GeForce RTX 5090 GPU can surpass 700 tokens. After 4-bit floating-point (NVFP4) quantization, the inference VRAM usage can be reduced to within 18GB, significantly lowering the barrier for local deployment.

DiffusionGemma's weights have been open-sourced on Hugging Face and have received support from mainstream development tools such as MLX, vLLM, Unsloth, and NVIDIA NeMo.

举报 Correction/Report
Correction/Report
Submit
Add Library
Visible to myself only
Public
Save
Choose Library
Add Library
Cancel
Finish