header-langage
简体中文
繁體中文
English
Tiếng Việt
한국어
日本語
ภาษาไทย
Türkçe
Scan to Download the APP

NVIDIA Reveals Blackwell Cost Breakdown: GPU Price Double, Token Reversal Cost 35x

According to Dochat Beating's monitoring, NVIDIA released a blog dissecting inference hardware selection, with the core argument summed up in one sentence: Evaluating inference infrastructure should consider the "cost per token" rather than "cost per GPU per hour." When comparing GPUs based on unit price, Blackwell is more expensive; however, when considering token cost, Blackwell outperforms the previous generation.

The blog focuses on DeepSeek-R1 (MoE inference model) as the test subject, comparing Blackwell (GB300 NVL72) to the previous generation Hopper (HGX H200). Based on cloud market leasing references, Blackwell costs $2.65 per GPU per hour, almost twice as expensive as Hopper's $1.41. But with a single GPU's token output per second jumping from 90 to 6000, a 65x throughput improvement is achieved. The cost per million tokens decreases from $4.20 to $0.12 when averaged out. Token output per megawatt increases by 50x.

A key point to note: the $0.12 figure is based on FP4 low-precision inference plus MTP (Multi-Token Prediction, which allows the model to generate multiple tokens at once to accelerate). Based on SemiAnalysis InferenceX v2 raw data, running DeepSeek-R1 on the same GB300 NVL72 without MTP results in a cost of approximately $2.35 per million tokens. When MTP is activated, this cost drops to around $0.11, showcasing a 21x difference from this single optimization alone. All the aforementioned results are based on tests of the DeepSeek-R1 single model; numbers may vary for different model architectures and scales.

举报 Correction/Report
Correction/Report
Submit
Add Library
Visible to myself only
Public
Save
Choose Library
Add Library
Cancel
Finish