ARTFEED — Contemporary Art Intelligence

ARC-AGI-3 Benchmark Vulnerable to Non-Intelligent Strategies

ai-technology · 2026-05-26

A new paper on arXiv (2605.25931) systematically analyzes all 25 public ARC-AGI-3 games and finds that every one can be solved by non-intelligent strategies, including blind steps, probing actions, and repeated button presses. A library-level null-coordinate vulnerability bypasses 18 games in one step. The authors argue the public evaluation set cannot distinguish intelligent exploration from trivial heuristics, making the private 55-game evaluation the only genuine intelligence test. They introduce AERA (Adaptive Epistemic Reasoning Agent), a three-phase agent (EXPLORE/VERIFY/PLAN) that achieves RHAE=0.2116 (4/25 solved) using Qwen2.5-0.5B, while random and no-explore baselines score 0.0000. The work formalizes a Speed–Depth trade-off framework under a convexity assumption.

Key facts

  • All 25 public ARC-AGI-3 games are reachable via non-intelligent strategies.
  • 10 games solved in a single blind step.
  • 5 games solved after one probing action.
  • 1 game solved via repeated ACTION1 presses.
  • 1 game solved via diverse exploration.
  • 8 games solved via single repeated actions with 50-200 steps.
  • A library-level null-coordinate vulnerability bypasses 18 games in 1 step.
  • AERA achieves RHAE=0.2116 (4/25 solved) with Qwen2.5-0.5B.
  • Random and no-explore baselines score 0.0000.
  • Private 55-game evaluation is the only genuine intelligence test.

Entities

Institutions

  • arXiv

Sources