header-langage
简体中文
繁體中文
English
Tiếng Việt
한국어
日本語
ภาษาไทย
Türkçe
Scan to Download the APP

Human Full Clearance, AI Peak 0.37%: ARC-AGI-3 Uses 'Unknown Game' to Test Agent True Intelligence

According to 1M AI News monitoring, Keras founder François Chollet and Zapier co-founder Mike Knoop's non-profit organization, the ARC Prize Foundation, has released the ARC-AGI-3 Benchmark Test. Unlike the previous two generations of static grid inference tasks, ARC-AGI-3 is a set of interactive turn-based environments where an Agent acts in a 64×64, 16-color grid world without receiving any instructions or goal prompts, and must autonomously explore the environment, infer rules and win conditions, build a world model, and plan a sequence of actions.

The scoring is based on an "action efficiency" mechanism, where the higher the score, the fewer steps needed to complete the same level, used to differentiate true reasoning ability from brute-force enumeration. Each environment has been human-calibrated to confirm it can be cleared by humans 100% of the time on first contact. Leading AI model scores as of the release date:

1. Google Gemini 3.1 Pro Preview: 0.37%
2. OpenAI GPT 5.4 (High): 0.26%
3. Anthropic Opus 4.6 (Max): 0.25%
4. xAI Grok-4.20 (Beta): 0.00%

The release of the new version is partly due to concerns about the "contamination" of the previous benchmark. The paper notes that Gemini 3 automatically used the ARC-AGI's integer-color mapping relationship in the reasoning chain (such as "3 = green"), despite the fact that this mapping was never mentioned in the prompt words, strongly suggesting that the model's training data adequately covered the ARC-AGI task. ARC-AGI-3 resists such memory shortcuts through interactive environments and autonomous goal discovery mechanisms. The total prize pool for the ARC Prize 2026 competition exceeds $2 million.

举报 Correction/Report
Correction/Report
Submit
Add Library
Visible to myself only
Public
Save
Choose Library
Add Library
Cancel
Finish