According to Dynamic Insight monitoring, recently, multiple urgent calls for help regarding the Gemini API billing system running out of control have emerged on the Google AI Developer Forum. Several developers, in the course of normal usage, faced massive unexpected charges due to underlying system bugs. For example, one individual was debited nearly 27,000 yuan in just 12 hours. Currently, Google's billing and technical teams are deflecting responsibility for the issue, and no official fix statement or expedited refund channel has been issued.
Upon investigation, the two main core bugs that led to developers receiving astronomical bills were identified as follows: Firstly, the "Ghost Cache" vulnerability, where the context created by developers via the API expired or was deleted, rendering the front-end management list empty, yet Google's backend billing continued to charge at a rate of thousands of yuan per hour. Secondly, the "Infinite Mind Loop" trap, wherein when using tools such as web search, the model's "thought budget limit" malfunctioned, causing the model to engage in endless reasoning when handling simple tasks, burning through up to 64,000 tokens before timing out and crashing, even if ultimately delivering "zero output" (providing no useful answers). Google still invoiced the full skyrocketed 1500x thought cost.
Due to a severe 32 to 72-hour latency in Google's cloud billing system and a lack of automatic quota breakage mechanism, developers were debited significantly before receiving any alerts. As a result of official customer service deflection and no direct responses on the forum, to mitigate financial risk, some affected developers have announced a complete discontinuation of Gemini's context cache and inference model in the production environment.
