According to Perceiving Beating monitoring, the Alibaba PAI team has released and open-sourced a small-scale conversational AI language model, AgenticQwen, designed for industrial-grade tool invocation (including 8B and 30B-A3B versions). This series of models, trained on an innovative "dual-data flywheel" reinforcement learning framework, significantly reduces inference costs while achieving near-hundred-billion-parameter large-model intelligence capabilities for agents.
The core mechanism lies in its "dual-data flywheel" training approach. Traditional synthetic data tends to homogenize, leading to model performance bottlenecks. AgenticQwen addresses this by introducing two flywheels: the Inference Flywheel autonomously generates more challenging variants from the model's mistakes, while the Agent Flywheel extends simple linear workflows (such as a single booking flow) to multi-branch behavior trees with constraints, refusals, and adversarial conditions based on the model's execution trajectory, simulating real-world complex decision-making scenarios.
Evaluations show that AgenticQwen-8B achieves an average score of 47.4 in real tool environment benchmarks (such as TAU-2 and BFCL-V4), far surpassing the base Qwen3-8B (23.8) and approaching Qwen3-235B (52.0). AgenticQwen-30B-A3B (activating only 3B parameters) scores 50.2. The model has now been deployed in an internal production system similar to Manus, significantly narrowing the gap with the 235B large model (shorter end-to-end inference time). However, the paper also acknowledges that due to the 40K native context length constraint, the small model still has limitations in deep search tasks.
