NewsFlash Articles Data Fundraising Skill&API

GLM 5.2 Rises to Second Place in Long-Term Business Simulation Evaluation, Kimi and Minimax Show Contrasting Performance

According to Perceive Beating monitoring, the latest Vending-Bench 2 evaluation released by Andon Labs shows that the open-source model GLM 5.2 successfully claimed the second spot. The evaluation used code to simulate the virtual operation of a vending machine business for 365 days, with the model being fed current inventory and financial data each day and making decisions such as restocking and pricing through API calls. The goal was to assess the decision consistency of large language models in a long-term task. Data analysis reveals that all versions of GLM exhibited a strong linear growth trend in the evaluation, with an average monthly profit improvement of nearly $1,000 (where GLM 5 averaged $4,432 and GLM 5.1 increased to $5,634).

In contrast to GLM's consistent progress, other mainstream domestic models showed varying performance in their latest versions. Kimi K2.7 Code demonstrated a slight decline in performance compared to its predecessor, Kimi K2.6. Minimax M3 showed a significant improvement compared to the previous M2.5, but its overall profit level still lags far behind the Kimi and GLM series models.

Source

Correction/Report

On-Chain Activity

19min ago

Base Mainnet Upgrade Postponed Until 2:00 AM Tomorrow, B20 Token Deployment Possible After Registry Launch

Hedging Buying Pressure Reverses Gold's Decline, Up 1.5%; Smart Money Opens 20x Long Position at Daily Low

OpenAI's internal ChatGPT has been mostly deprecated, with 99.8% of AI output taken over by Codex

F2Pool co-founder Wang Chun Once Again Accumulates 9,937 ETH and 147.5 WBTC in Less Than 6 Hours

Correction/Report

Submit

Add Library

Visible to myself only

Public

Save

Choose Library

Add Library

Cancel

Finish

GLM 5.2 Rises to Second Place in Long-Term Business Simulation Evaluation, Kimi and Minimax Show Contrasting Performance

Hedging Buying Pressure Reverses Gold's Decline, Up 1.5%; Smart Money Opens 20x Long Position at Daily Low

F2Pool co-founder Wang Chun Once Again Accumulates 9,937 ETH and 147.5 WBTC in Less Than 6 Hours

Abraxas Capital Whale Arbitrage Gold Annualized Return Reaches 25.9%, with a 10.2% ROI based on the Funding Rate

An anonymous whale has shorted Bitcoin by 40x and SPCX by 10x, totaling $73.76 million in value.