ARC-AGI-3 Benchmark Vulnerable to Non-Intelligent Strategies
A new paper on arXiv (2605.25931) systematically analyzes all 25 public ARC-AGI-3 games and finds that every one can be solved by non-intelligent strategies, including blind steps, probing actions, and repeated button presses. A library-level null-coordinate vulnerability bypasses 18 games in one step. The authors argue the public evaluation set cannot distinguish intelligent exploration from trivial heuristics, making the private 55-game evaluation the only genuine intelligence test. They introduce AERA (Adaptive Epistemic Reasoning Agent), a three-phase agent (EXPLORE/VERIFY/PLAN) that achieves RHAE=0.2116 (4/25 solved) using Qwen2.5-0.5B, while random and no-explore baselines score 0.0000. The work formalizes a Speed–Depth trade-off framework under a convexity assumption.
Key facts
- All 25 public ARC-AGI-3 games are reachable via non-intelligent strategies.
- 10 games solved in a single blind step.
- 5 games solved after one probing action.
- 1 game solved via repeated ACTION1 presses.
- 1 game solved via diverse exploration.
- 8 games solved via single repeated actions with 50-200 steps.
- A library-level null-coordinate vulnerability bypasses 18 games in 1 step.
- AERA achieves RHAE=0.2116 (4/25 solved) with Qwen2.5-0.5B.
- Random and no-explore baselines score 0.0000.
- Private 55-game evaluation is the only genuine intelligence test.
Entities
Institutions
- arXiv