NewsFlash Articles Data Fundraising Skill&API

Google's Dueling Bid Organizing High-Difficulty Stalemate Breakthrough, TERMS-Bench Turns AI Negotiation into Bankruptcy Stress Test

According to Dynamic Insight Beating monitoring, Stanford's Erica Zhang and others have released the TERMS-Bench economic negotiation test set. It removes the black-box "big model referee," allowing evaluators to directly see whether the model lost due to bidding, concessions, or violations.

In the standard tests, Claude Opus 4.6 and ZhiPu GLM 5.1 took the top two spots. The research found that they adopted a "high-bidding, no concessions" aggressive strategy, which could squeeze dry their opponents in profitable games.

However, in the highest difficulty games where the profit margin is extremely narrow, the aggressive strategy would backfire due to frequent breakdowns in negotiations. The leaderboard saw a direct turnaround here: Gemma 4 31B (open weight model) and Gemini 3.1 Pro, who knew how to make moderate concessions to protect orders, surged to the top two spots; meanwhile, the previous leaders Claude dropped to 5th place, and GLM dropped to 9th.

In addition to testing the limit difficulty, the most impactful part of this benchmark is the Bankroll mode that tests survivability. A single negotiation is extended into continuous purchases: each Agent starts with $100 in principal and negotiates for 50 rounds, with a fixed operational fee deducted each round. If they run out of funds, they go bankrupt. Here, even small negotiation mistakes can compound into a bankruptcy crisis.

The results show that the aforementioned GLM 5.1, Claude Opus 4.6, and the Google duo, although they have different strategies, are in a league of their own in controlling the game, with all achieving 100% survival and ending up with a cash amount ranging from $380 to $443. In contrast, Grok 4.20 and GPT-4o-mini could not withstand the cash flow drain, with bankruptcy rates reaching 25% and 50%, respectively.

The key to TERMS-Bench is not the transaction rate but converting negotiation errors into cash losses and bankruptcy risks. Whether a model can persuade opponents is just the first layer; the real game-changer is whether it can maintain profits and cash flow in continuous trading.

Source

Correction/Report

On-Chain Activity

47min ago

Hyperliquid's two major market makers simultaneously withdrew BTC liquidity supply, with Wintermute and Auros Global collectively offloading nearly $100 million in exposure

24H Important News

2026-05-18

Iranian Foreign Ministry Spokesperson: Negotiation Process Mediated by Pakistan Underway

OpenAI is secretly developing a real-time voice mode for Codex

Musk: Hopes to "soon" advance SpaceX's IPO

a16z Affiliate Address Acquires an Additional 372,000 HYPE Tokens, Equivalent to $16.91 Million

Correction/Report

Submit

Add Library

Visible to myself only

Public

Save

Choose Library

Add Library

Cancel

Finish

Google's Dueling Bid Organizing High-Difficulty Stalemate Breakthrough, TERMS-Bench Turns AI Negotiation into Bankruptcy Stress Test

a16z Affiliate Address Acquires an Additional 372,000 HYPE Tokens, Equivalent to $16.91 Million

trade.xyz's weekly trading volume surpassed $12.5 billion last week, rapidly growing into the "Nasdaq of the Chain."

A certain whale bought 5,001 ETH in the past 2 hours, worth approximately $10.6 million.

Hyperliquid's two major market makers simultaneously withdrew BTC liquidity supply, with Wintermute and Auros Global collectively offloading nearly $100 million in exposure