header-langage
简体中文
繁體中文
English
Tiếng Việt
한국어
日本語
ภาษาไทย
Türkçe
Scan to Download the APP

Epoch AI Releases Claude Skills Graph: The Coding Long Tail Continues, Opus 4.6 and 4.7 Address Math Shortcomings

According to MetaAI Watch monitoring, Epoch AI has released the latest analysis of the Domain-specific Capability Index (Domain-specific ECI), revealing that Anthropic's Claude series models have consistently shown strength in coding ability and weakness in mathematics relative to their overall capability. However, the latest data shows that this bias is rapidly easing.

Calculations have shown that in previous multi-generational models, Claude has consistently scored higher in Software Engineering Benchmark Tests (SWE-ECI) compared to its overall score, while a long-standing gap has existed in Mathematics Benchmark Tests (Math-ECI). The newly released Opus 4.6 and 4.7 models have narrowed the gap between mathematics and overall scores to within 1 point, addressing the previous shortfall.

The calculation mechanism of ECI compares the relative performance of various models, thus directly reflecting the average difficulty of specific tasks for AI, rather than for humans.

举报 Correction/Report
Correction/Report
Submit
Add Library
Visible to myself only
Public
Save
Choose Library
Add Library
Cancel
Finish