NewsFlash Articles Data Fundraising Skill&API

Artificial Analysis New AI Benchmark Shows that Claude is 44 Times More Expensive than DeepSeek

According to DynaInsight Beating monitoring, the evaluation agency Artificial Analysis has adjusted the assessment criteria of the AI Intelligence Index. The new evaluation no longer only requires AI to answer multiple-choice questions but now comprehensively tests whether AI can autonomously plan, use tools, and solve complex tasks. The new evaluation has scrapped the old project that tested understanding simple instructions and instead introduced high-difficulty scenarios such as simulating real bank customer service conversations, with the core assessment metric for how much money and time it takes to complete a task for the first time.

In the latest evaluation results, Claude Fable 5, which has been taken offline by the U.S. government, achieved the highest score of 60 points. Among the AI models currently available in the market, the most expensive Claude Opus 4.8 scored 56 points to take the top spot, narrowly ahead of GPT-5.5, which scored 55 points. Domestic models also performed remarkably well, with the open-source DeepSeek V4 Pro and MiniMax M3 both scoring 44 points, followed closely by Kimi K2.6 with 43 points.

There is a significant difference in the cost of models. Running the same task, using the state-of-the-art Claude Opus 4.8 costs $1.78, while running with the domestic open-source DeepSeek V4 Pro only requires $0.04. This means that Claude's cost per invocation is 44 times that of DeepSeek. The completion time for a task also varies greatly, with the fastest xAI Grok 4.3 taking only 1.5 minutes, while the slowest Claude Sonnet 4.6 requires 13.5 minutes.

As the highest-weighted single test in this redesign, the GDPval-AA test of real-world knowledge work has been upgraded to version 2, accounting for 20% of the evaluation. The new version sets the human benchmark score at 1000 and introduces multiple cutting-edge models as rotating judges, while also extending the single conversation round limit to 250.

Source

Correction/Report

On-Chain Activity

1h ago

The largest ETH short seller, known as “pension-usdt.eth,” has initiated a liquidation event, giving back $5.9 million in unrealized gains due to a price rebound.

1h ago

Trump: Russia Should Make a Deal with Ukraine

Ripple Invests in African Fintech Company Flutterwave

Wintermute: Declaring a Cryptocurrency Market Bottom is Premature, but Risk Appetite is Clearly Returning

Humanity Announces Recovery Plan: Old Tokens to Be Fully Deprecated, New H Token to Be Airdropped at 1:1 Ratio

Correction/Report

Submit

Add Library

Visible to myself only

Public

Save

Choose Library

Add Library

Cancel

Finish

Artificial Analysis New AI Benchmark Shows that Claude is 44 Times More Expensive than DeepSeek

The largest ETH short seller, known as “pension-usdt.eth,” has initiated a liquidation event, giving back $5.9 million in unrealized gains due to a price rebound.

HIP-3 US Stock Gainers: SPCX Leads Gains, Storage Semiconductor Sector Shows Strength

From Retail Investor to $7.7 Million Stock Market Whale, 'Stock Trading King' Turns $30,000 into Hundredfold Profit

Today's BTC largest long position reached $12.06 million, with the whale liquidation price at $61,900.