NewsFlash Articles Data Fundraising Skill&API

Inference Cost Only One-Twentieth of GPT-5.5, Gemini 3.2 Real-Time Model Appears on Google Cloud

According to Dynamic Insight Beating monitoring, a foundational model option named gemini-3.2-flash-lite-live-preview has appeared in the model screening list of the Google Cloud Console. This comes after traces of this model series were previously exposed in iOS app build packages and AI Studio earlier this month.

The new option, with lite and live suffixes, indicates that Google is splitting out a specialized version for ultra-low-latency real-time interaction. Bindu Reddy, CEO of Abacus.AI, previously revealed that Gemini 3.2 Flash has an encoding and inference capability of 92% of GPT-5.5, but thanks to distillation and sparsification techniques, the inference cost is only one-twentieth of the latter, with most queries having a latency of less than 200 milliseconds.

With the cloud interface taking an early lead, industry insiders expect this high-value lightweight model to be officially launched at the Google I/O conference on May 20.

Source

Correction/Report

On-Chain Activity