header-langage
简体中文
繁體中文
English
Tiếng Việt
한국어
日本語
ภาษาไทย
Türkçe
Scan to Download the APP

Alibaba has launched the Qwen-Robot, a physical smart base with embodiment intelligence that aligns natural language with multi-domain physical actions to achieve zero-shot deployment.

According to Perceiving Beating monitoring, the Alibaba Big Model Team has released the Embodied Intelligence Base Model Suite Qwen-Robot Suite, which includes three base models: Qwen-RobotNav, Qwen-RobotManip, and Qwen-RobotWorld, corresponding to the navigation, manipulation, and world simulation areas of physical actions. The suite aims to align visual-language models with physical actions to achieve multi-tasking and multi-robot embodiment generalization.

The navigation model, Qwen-RobotNav, integrates tasks such as instruction following, target navigation, target tracking, and autonomous driving. In its design, the model parameterizes visual attention strategies, supporting dynamic adjustments of visual token budget and frame sampling during inference. Trained on 15.6 million samples, Qwen-RobotNav has achieved SOTA in 5 navigation domains and has been zero-shot deployed on the Yushu Go2 quadruped robot.

The manipulation model, Qwen-RobotManip, is built on the Qwen3.5-4B VL backbone network and flow-matching DiT action head, using an 80-dimensional state-action representation to output end-effector incremental poses. The team trained the model on over 38,100 hours of data, including open-source robot demos, human videos, and human-robot transfer-synthesized data, achieving a 91.4% success rate in the LIBERO-Plus evaluation.

The physical world prediction model, Qwen-RobotWorld, adopts a natural language unified robot action interface. Architecturally, the model couples Qwen2.5-VL semantic representation with video latent variables in depth through a 60-layer dual-stream MMDiT structure. Trained on 8.6 million video-text pairs, Qwen-RobotWorld has ranked first in physical law compliance evaluations such as EWMBench and WorldModelBench.

All three models provide a language-first interface. Alibaba has also introduced the robot intelligence framework Qwen-RobotClaw, allowing upper-level planners (such as Qwen-3.5) to call the suite models as physical tools to perform multi-step operations.

举报 Correction/Report
Correction/Report
Submit
Add Library
Visible to myself only
Public
Save
Choose Library
Add Library
Cancel
Finish