header-langage
简体中文
繁體中文
English
Tiếng Việt
한국어
日本語
ภาษาไทย
Türkçe
Scan to Download the APP

MiniMax Agent Goes Live on Computer Use, Enabling Lark and WeChat Remote Desktop Control

According to 1M AI News monitoring, MiniMax released two updates for its desktop Agent today: the Pocket feature (Beta) and the official launch of Computer Use.

Pocket integrates with mainstream IM platforms such as Feishu, WeChat, WorkWeChat, and Slack. Users can issue commands in the IM, and the Agent will perform tasks on their computer and send back the results to the original conversation. Computer Use enables the Agent to view the screen, control the mouse and keyboard, and directly interact with local software, system settings, and graphical interface tasks. These two capabilities work together: users can command from their phones, and the Agent will execute on the computer, eliminating the need to sit in front of the computer.

From a technical perspective, MiniMax divides desktop operations into four tool domains: Desktop Control (screenshot, mouse, and keyboard input), Window Manager (window management and application launch), Browser Engine (DOM manipulation and CSS selector), and Clipboard (clipboard read and write). This is combined with CLI and Bash tools for Feishu, WorkWeChat, and other platforms, totaling over 60 tools.

In terms of visual perception, the Agent outputs relative coordinates between 0 and 1, which the system converts to actual screen pixels to ensure consistent operation accuracy on Retina and 4K displays. After each step, automatic screenshot verification is performed. In case of failure, alternative solutions are attempted (such as using keyboard shortcuts instead of mouse clicks). If the issue persists after multiple attempts, the Agent proactively reports the checkpoint location to the user.

Permission management is also integrated into the IM: Before executing sensitive operations like file deletion, the Agent pauses to send a confirmation request to the IM. Feishu and Slack present this as interactive cards, while WeChat uses text commands for authorization. Users can issue commands to abort tasks at any time.

举报 Correction/Report
Correction/Report
Submit
Add Library
Visible to myself only
Public
Save
Choose Library
Add Library
Cancel
Finish