header-langage
简体中文
繁體中文
English
Tiếng Việt
한국어
日本語
ภาษาไทย
Türkçe
Scan to Download the APP

GeniusNet GLM-5.2 Tops DeepSWE Open Source for the First Time: Solves 44% Complex Development Tasks, Outperforming Key Proprietary Models

According to Sentinel Beating monitoring, the SmartSpectrum AI open-source model GLM-5.2 has officially joined the DeepSWE long-range software engineering benchmark. In the maximum thinking mode, the one-shot success rate for complex development tasks has reached 44%, ranking first among open-source models. Compared to the previously listed Kimi K2.7 Code, the success rate is 13 percentage points higher.

GLM-5.2 achieves an average cost of $3.92 per task, slightly higher than Kimi K2.7 Code at $2.82, yet surpassing the performance of several mainstream closed-source models in specific thinking configurations, including Claude Sonnet 4.6 [high] (30%), Gemini 3.5 Flash [medium] (37%), and Claude Opus 4.8 [low] (41%).

The benchmark designed by the test initiator, Datacurve, specifically evaluates AI agents' ability to tackle long tasks in DeepSWE. The test consists of 113 real-world coding problems covering 5 languages. Unlike traditional tests that only modify a single line of code, DeepSWE requires AI to collaboratively edit multiple files, with an average code fix exceeding 600 lines. The evaluation runs in isolated containers, strictly limiting CPU and memory resources.

举报 Correction/Report
Correction/Report
Submit
Add Library
Visible to myself only
Public
Save
Choose Library
Add Library
Cancel
Finish