header-langage
简体中文
繁體中文
English
Tiếng Việt
한국어
日本語
ภาษาไทย
Türkçe
Scan to Download the APP

GLM 5.2 Reproduction Academic Paper Cost Only About One-Eighth of Opus 4.8

According to Perceive Beating monitoring, the open-source large model GLM 5.2 has demonstrated a high cost-effectiveness in academic reproducibility testing. The alphaXiv research platform team used automated agents to test the large model's ability to reproduce cutting-edge papers. When reproducing the self-distillation for policy optimization paper (SDPO), GLM 5.2 incurred a running cost that was only about one-eighth of the cost of the closed-source flagship model Claude Opus 4.8 Max.

The experiment required the model to autonomously read the paper, troubleshoot the complex environment errors from the VeRL open-source library, and complete the ablation study. GLM 5.2 successfully reproduced after 14 failed runs, consuming 2.65 million tokens, with a total cost of $6.21. Claude Opus 4.8 Max succeeded after 9 failures, consuming 4.53 million tokens, resulting in a cost of $46.35.

举报 Correction/Report
Correction/Report
Submit
Add Library
Visible to myself only
Public
Save
Choose Library
Add Library
Cancel
Finish