Coding Agent with World Model Achieves 28% Solve Rate on ARC-AGI-3
A recent preprint on arXiv (2605.05138) assesses a coding-agent framework for ARC-AGI-3, which operates an executable Python world model. This model is validated against real-world observations and is simplified through refactoring to achieve a form of simplicity bias akin to MDL. Planning occurs within the model prior to execution. The system comprises a scripted controller, established world-model interfaces, verification programs, and a planning executor, all devoid of any game-specific coded logic. In 25 public ARC-AGI-3 games, each session utilizes a new agent instance without prior file or conversation access. The agent successfully completed 7 games (28% solve rate), surpassed a Relative Human Action Efficiency of 75% in 6 games, and recorded an average score per game. Variability in results was noted across multiple independent playthroughs for certain games. This method is purposefully straightforward, emphasizing explicit verification and refactoring over learned components.
Key facts
- arXiv:2605.05138 evaluates a coding-agent system for ARC-AGI-3
- Agent maintains an executable Python world model
- System uses scripted controller, predefined interfaces, verifier programs, plan executor
- No hand-coded game-specific logic
- Tested on 25 public ARC-AGI-3 games
- Each playthrough uses a fresh agent instance
- Agent fully solved 7 games (28% solve rate)
- Relative Human Action Efficiency >75% on 6 games
- Multiple playthroughs for some games show run-to-run variability
Entities
Institutions
- arXiv