According to 1M AI News monitoring, Keras founder François Chollet and Zapier co-founder Mike Knoop's non-profit organization, the ARC Prize Foundation, has released the ARC-AGI-3 Benchmark Test. Unlike the previous two generations of static grid inference tasks, ARC-AGI-3 is a set of interactive turn-based environments where an Agent acts in a 64×64, 16-color grid world without receiving any instructions or goal prompts, and must autonomously explore the environment, infer rules and win conditions, build a world model, and plan a sequence of actions.
The scoring is based on an "action efficiency" mechanism, where the higher the score, the fewer steps needed to complete the same level, used to differentiate true reasoning ability from brute-force enumeration. Each environment has been human-calibrated to confirm it can be cleared by humans 100% of the time on first contact. Leading AI model scores as of the release date:
1. Google Gemini 3.1 Pro Preview: 0.37%
2. OpenAI GPT 5.4 (High): 0.26%
3. Anthropic Opus 4.6 (Max): 0.25%
4. xAI Grok-4.20 (Beta): 0.00%
The release of the new version is partly due to concerns about the "contamination" of the previous benchmark. The paper notes that Gemini 3 automatically used the ARC-AGI's integer-color mapping relationship in the reasoning chain (such as "3 = green"), despite the fact that this mapping was never mentioned in the prompt words, strongly suggesting that the model's training data adequately covered the ARC-AGI task. ARC-AGI-3 resists such memory shortcuts through interactive environments and autonomous goal discovery mechanisms. The total prize pool for the ARC Prize 2026 competition exceeds $2 million.
