header-langage
简体中文
繁體中文
English
Tiếng Việt
한국어
日本語
ภาษาไทย
Türkçe
Scan to Download the APP

Can AI Replace Financial Analysts? Vals AI's New Test Ends in Disaster, GPT 5.5 Accuracy Just Above 50%

According to Dongcha Beating monitoring, the AI evaluation agency Vals AI has released the second generation of the Financial Intelligence Agent Benchmark Test (Finance Agent v2). This is an end-to-end test simulating a junior financial analyst's workflow, including 927 expert-reviewed questions. The new version of the test has seen a significant spike in difficulty, with GPT 5.5 only achieving the top spot with a 51.76% accuracy rate, in an extremely close match with Claude Opus 4.7 (51.51%) and Claude Sonnet 4.6 (51.03%).

Unlike a single-turn Q&A, this test requires the model to autonomously seek relevant paragraphs in hundreds of pages of 10-K and 10-Q reports, deal with cross-year financial statement adjustments, and complete multi-step calculations with precise intermediate numbers. Vals AI revealed that if a "must get all correct" strict scoring standard is adopted, the accuracy rates of all cutting-edge models plummet to below 40%; in the most challenging "Financial Modeling" and "Precedent Analysis" categories, the highest score is only 23%.

In other model aspects, Kimi K2.6 ranks fifth with 44.87%, being the highest-scoring domestic model; following closely are GLM 5.1 (44.79%) and DeepSeek V4 (44.08%). Furthermore, the "Fastest Speed" label was awarded to Claude Opus 4.7 (single-run time of 360 seconds), while GLM 5.1 claimed the "Most Cost-Efficient" label (single-run cost of $0.62).

The collective decline in scores in this test (Opus 4.7 scored 64.4% in the previous generation test) proves one thing: current AI models can handle simple retrievals, but in the complex financial waters that require compliance with specific industry practices and demand a high level of numerical precision, they are far from being able to replace human analysts.

举报 Correction/Report
Correction/Report
Submit
Add Library
Visible to myself only
Public
Save
Choose Library
Add Library
Cancel
Finish