According to MetaAI Watch monitoring, Epoch AI has released the latest analysis of the Domain-specific Capability Index (Domain-specific ECI), revealing that Anthropic's Claude series models have consistently shown strength in coding ability and weakness in mathematics relative to their overall capability. However, the latest data shows that this bias is rapidly easing.
Calculations have shown that in previous multi-generational models, Claude has consistently scored higher in Software Engineering Benchmark Tests (SWE-ECI) compared to its overall score, while a long-standing gap has existed in Mathematics Benchmark Tests (Math-ECI). The newly released Opus 4.6 and 4.7 models have narrowed the gap between mathematics and overall scores to within 1 point, addressing the previous shortfall.
The calculation mechanism of ECI compares the relative performance of various models, thus directly reflecting the average difficulty of specific tasks for AI, rather than for humans.
