header-langage
简体中文
繁體中文
English
Tiếng Việt
한국어
日本語
ภาษาไทย
Türkçe
Scan to Download the APP

Alibaba PAI Open Sources AgenticQwen Tiny Model: Leveraging the "Dual-Data Flywheel" to Bring 8B Performance Close to 235B

According to Perceiving Beating monitoring, the Alibaba PAI team has released and open-sourced a small-scale conversational AI language model, AgenticQwen, designed for industrial-grade tool invocation (including 8B and 30B-A3B versions). This series of models, trained on an innovative "dual-data flywheel" reinforcement learning framework, significantly reduces inference costs while achieving near-hundred-billion-parameter large-model intelligence capabilities for agents.

The core mechanism lies in its "dual-data flywheel" training approach. Traditional synthetic data tends to homogenize, leading to model performance bottlenecks. AgenticQwen addresses this by introducing two flywheels: the Inference Flywheel autonomously generates more challenging variants from the model's mistakes, while the Agent Flywheel extends simple linear workflows (such as a single booking flow) to multi-branch behavior trees with constraints, refusals, and adversarial conditions based on the model's execution trajectory, simulating real-world complex decision-making scenarios.

Evaluations show that AgenticQwen-8B achieves an average score of 47.4 in real tool environment benchmarks (such as TAU-2 and BFCL-V4), far surpassing the base Qwen3-8B (23.8) and approaching Qwen3-235B (52.0). AgenticQwen-30B-A3B (activating only 3B parameters) scores 50.2. The model has now been deployed in an internal production system similar to Manus, significantly narrowing the gap with the 235B large model (shorter end-to-end inference time). However, the paper also acknowledges that due to the 40K native context length constraint, the small model still has limitations in deep search tasks.

举报 Correction/Report
Correction/Report
Submit
Add Library
Visible to myself only
Public
Save
Choose Library
Add Library
Cancel
Finish