header-langage
简体中文
繁體中文
English
Tiếng Việt
한국어
日本語
ภาษาไทย
Türkçe
Scan to Download the APP

Inference Cost Only One-Twentieth of GPT-5.5, Gemini 3.2 Real-Time Model Appears on Google Cloud

According to Dynamic Insight Beating monitoring, a foundational model option named gemini-3.2-flash-lite-live-preview has appeared in the model screening list of the Google Cloud Console. This comes after traces of this model series were previously exposed in iOS app build packages and AI Studio earlier this month.

The new option, with lite and live suffixes, indicates that Google is splitting out a specialized version for ultra-low-latency real-time interaction. Bindu Reddy, CEO of Abacus.AI, previously revealed that Gemini 3.2 Flash has an encoding and inference capability of 92% of GPT-5.5, but thanks to distillation and sparsification techniques, the inference cost is only one-twentieth of the latter, with most queries having a latency of less than 200 milliseconds.

With the cloud interface taking an early lead, industry insiders expect this high-value lightweight model to be officially launched at the Google I/O conference on May 20.

举报 Correction/Report
Correction/Report
Submit
Add Library
Visible to myself only
Public
Save
Choose Library
Add Library
Cancel
Finish