According to Dynamic Beating monitoring, Google is planning to unveil the next-generation lightweight model Gemini 3.2 Flash at the May 20 I/O event. The model's overall performance is roughly on par with GPT-5.5, but notably falls short of Anthropic's Mythos.
Abacus.AI CEO Bindu Reddy revealed rumors that Gemini 3.2 Flash has achieved 92% of GPT-5.5's performance on encoding and inference tasks, yet the inference cost is only fifteen to twenty times lower, with most queries experiencing latency below 200 milliseconds. She believes that Google's distillation and sparsification technology is making a significant impact, essentially compressing a cutting-edge model into Flash level without encountering the usual performance cliff.
Signs of Gemini 3.2 Flash leakage had surfaced earlier. In early May, traces of the model were found in iOS app build packages and AI Studio metadata, and it later appeared anonymously in evaluations on LM Arena. Early testers reported that the model excelled in creative coding tasks, even outperforming Gemini 3.1 Pro in some benchmarks.
