NewsFlash Articles Data Fundraising Skill&API

ARC-AGI-3 Announces Largest-Ever Human Test: All Levels Conquered by Humans, AI Still Lags

According to WatchBeat Monitoring, the ARC Prize Foundation has released the human performance dataset for ARC-AGI-3, the largest-scale human testing study in the ARC-AGI series to date, with a total of 458 participants. The dataset includes 342 complete human operation replay records, covering 25 public environments, all of which have been open-sourced.

ARC-AGI-3 consists of 135 abstract reasoning environments. Testers did not receive any gameplay instructions and had to explore, infer rules, and develop strategies on their own. The tests were conducted at an offline testing center in San Francisco, with each lasting 90 minutes. Participants received a base pay of around $130 plus a $5 reward for passing each environment. All tests were conducted under the "first-pass" condition, meaning each person saw the environment only once and attempted it only once, measuring their learning and adaptation abilities when faced with completely new problems. Both humans and AIs received the exact same information without any information asymmetry.

Key Findings: All environments in ARC-AGI-3 were passed by humans, with at least two independent participants completing each environment, and most environments having five or more passes. The ARC Prize Foundation stated, "We have not yet achieved AGI, and this dataset is the evidence for that."

Since the preview of ARC-AGI-3, the public environments have received nearly 1 million AI evaluation submissions. Based on this data, the foundation has announced two scoring rule adjustments simultaneously: first, changing the human benchmark from the "second-best player" to the "median player" per level to reduce the impact of luck on scores; second, raising the single-level score ceiling from 100% to 115% to prevent poor performance on one level from dragging down the overall score. The net effect of these two adjustments is a slight increase of about 0.5 percentage points in both human and AI scores.

Source

Correction/Report

On-Chain Activity

1h ago

「Whale's Heavy Positioning in Three Major Markets」 Whale Partially Exits Short Position in Oil, Portfolio Gains $1.3 Million in a Day

1h ago

"The 'Meme Coin' Collective Pullback, Leading Coin RAVE Drops Over 23% in Nearly 4 Hours"

Iran to Use Alternative Ports Besides Southern Port to Bypass US Sanctions

Musk Showcases AI5 Chip Prototype, Previously Shelved Training Chip Dojo Also Makes a Comeback

Yesterday Bitcoin ETF Net Inflow $4.114 billion, Ethereum ETF Net Inflow $53.10 million

Correction/Report

Submit

Add Library

Visible to myself only

Public

Save

Choose Library

Add Library

Cancel

Finish

ARC-AGI-3 Announces Largest-Ever Human Test: All Levels Conquered by Humans, AI Still Lags

「Whale's Heavy Positioning in Three Major Markets」 Whale Partially Exits Short Position in Oil, Portfolio Gains $1.3 Million in a Day

The Winklevoss twins withdrew 572 BTC from Gemini, worth approximately $42.77 million

ETH/BTC Exchange Rate Hits Near Two-Month High, Multi-Million Dollar Short Whale Loses $1.7 Million

「Buy the Dip」 Whale Reenters After $116M Liquidation, Closes $4.62M Brent Crude Oil Short Position with Small Gain