header-langage
简体中文
繁體中文
English
Tiếng Việt
한국어
日本語
ภาษาไทย
Türkçe
Scan to Download the APP

GLM 5.2 Rises to Second Place in Long-Term Business Simulation Evaluation, Kimi and Minimax Show Contrasting Performance

According to Perceive Beating monitoring, the latest Vending-Bench 2 evaluation released by Andon Labs shows that the open-source model GLM 5.2 successfully claimed the second spot. The evaluation used code to simulate the virtual operation of a vending machine business for 365 days, with the model being fed current inventory and financial data each day and making decisions such as restocking and pricing through API calls. The goal was to assess the decision consistency of large language models in a long-term task. Data analysis reveals that all versions of GLM exhibited a strong linear growth trend in the evaluation, with an average monthly profit improvement of nearly $1,000 (where GLM 5 averaged $4,432 and GLM 5.1 increased to $5,634).

In contrast to GLM's consistent progress, other mainstream domestic models showed varying performance in their latest versions. Kimi K2.7 Code demonstrated a slight decline in performance compared to its predecessor, Kimi K2.6. Minimax M3 showed a significant improvement compared to the previous M2.5, but its overall profit level still lags far behind the Kimi and GLM series models.

举报 Correction/Report
Correction/Report
Submit
Add Library
Visible to myself only
Public
Save
Choose Library
Add Library
Cancel
Finish