header-langage
简体中文
繁體中文
English
Tiếng Việt
한국어
日本語
ภาษาไทย
Türkçe
Scan to Download the APP

Google's flagship model Gemini 3.5 Flash now natively supports PC control, unlocking enterprise-grade intelligent body automation

According to Dynamic Insight Beating monitoring, Google is adding the Computer Use feature as a built-in tool directly integrated into the flagship Gemini 3.5 Flash model.

Prior to the native integration, developers had to invoke the specialized Gemini 2.5 Computer Use model to perform proxy tasks. With the native integration, developers and enterprise users can now directly control devices through the Gemini API or Google Cloud Gemini Enterprise Agent Platform (formerly Vertex AI platform) using the flagship model, streamlining agent development architecture.

The built-in Computer Use tool leverages screen captures from browsers, mobile, or desktop environments for visual perception and step reasoning. It then outputs operation commands such as mouse clicks, keyboard inputs, scroll wheel actions, and menu navigation to complete tasks like software regression testing and cross-page data collection for long process automation. To facilitate debugging and auditing, the model appends an "intent" field to the generated commands to explain the logic of each step.

To counter the risk of prompt injection that agents may encounter in a real network environment, Google has conducted targeted adversarial training on the model and provided two optional protections: mandatory human approval for irreversible operations involving fund transfers, file deletions, etc., and automatic task termination upon detecting indirectly injected instructions in screenshots.

Currently, Browserbase offers an online hosted demo environment (gemini.browserbase.com), and Google has also simultaneously open-sourced a reference implementation code named computer-use-preview on GitHub.

举报 Correction/Report
Correction/Report
Submit
Add Library
Visible to myself only
Public
Save
Choose Library
Add Library
Cancel
Finish