According to Dynamic Insight Beating monitoring, Google is adding the Computer Use feature as a built-in tool directly integrated into the flagship Gemini 3.5 Flash model.
Prior to the native integration, developers had to invoke the specialized Gemini 2.5 Computer Use model to perform proxy tasks. With the native integration, developers and enterprise users can now directly control devices through the Gemini API or Google Cloud Gemini Enterprise Agent Platform (formerly Vertex AI platform) using the flagship model, streamlining agent development architecture.
The built-in Computer Use tool leverages screen captures from browsers, mobile, or desktop environments for visual perception and step reasoning. It then outputs operation commands such as mouse clicks, keyboard inputs, scroll wheel actions, and menu navigation to complete tasks like software regression testing and cross-page data collection for long process automation. To facilitate debugging and auditing, the model appends an "intent" field to the generated commands to explain the logic of each step.
To counter the risk of prompt injection that agents may encounter in a real network environment, Google has conducted targeted adversarial training on the model and provided two optional protections: mandatory human approval for irreversible operations involving fund transfers, file deletions, etc., and automatic task termination upon detecting indirectly injected instructions in screenshots.
Currently, Browserbase offers an online hosted demo environment (gemini.browserbase.com), and Google has also simultaneously open-sourced a reference implementation code named computer-use-preview on GitHub.
