header-langage
简体中文
繁體中文
English
Tiếng Việt
한국어
日本語
ภาษาไทย
Türkçe
Scan to Download the APP

Qwen3.7-Max Officially Released: 35-Hour Independent Coding Marathon with 1158 Commits, Developing a 10x Acceleration Operator on Domestic Chip

According to OpenAI Beat monitoring, Alibaba's Thousand Questions AI has officially released the new generation intelligent agent flagship framework Qwen3.7-Max. Officially disclosed real-world data shows that, without any chip architecture documentation or performance analysis data, the new model forcibly increased the Triton operator performance of the domestic PlatHG TrueWar M890 processor by 10.0 times in a fully autonomous core optimization task spanning 35 hours and 1158 tool invocations.

During the optimization process, the model went through five core evolutionary stages. Firstly, it utilized Split-K partitioning to divide the prefix KV-cache along the token dimension to fill all 36 SM cores; then replaced cudaMalloc for host-to-device synchronization with pre-allocated PyTorch variables, and completely eliminated the synchronized cudaMemcpy action for query prefix length by using tensor metadata, thereby eradicating the communication overhead between the host and device; in the final stage, the model refactored the operator to simultaneously process all 4 query tokens in a single thread block, utilizing shared loading to distribute the memory access overhead, achieving a crucial architecture-level specialization refactoring.

Operator optimization measurements showed that Qwen3.7-Max achieved a 10.0x geometric mean speedup, significantly outperforming GLM 5.1 (7.3x) and Kimi K2.6 (5.0x). In contrast, DeepSeek V4 Pro only reached 3.3x and prematurely terminated the task in the latter half after five consecutive rounds without issuing any tool invocations.

To master a universal problem-solving strategy in a dynamic environment, Qwen3.7-Max decoupled tasks, execution frameworks, and validators during training, and avoided shortcuts tailored to specific benchmarks overfitting through cross-framework reinforcement learning. On the general intelligent agent benchmarks MCP-Mark (60.8 points) and SpreadSheetBench (87.0 points), Qwen3.7-Max demonstrated outstanding generalization, with comprehensive performance approaching Claude-4.6-Opus-Max.

举报 Correction/Report
Correction/Report
Submit
Add Library
Visible to myself only
Public
Save
Choose Library
Add Library
Cancel
Finish