Transformers Build Internal World Models Aligned with Sudoku's Constraint Structure

ai-technology · 2026-05-20

A new study from arXiv (2605.18847) reveals that transformers trained on sequential reasoning traces develop internal world models that mirror the constraint algebra of the domain, not its surface presentation. Researchers trained an 8-layer transformer on Sudoku solving traces and performed mechanistic analysis. They found the model organizes information around rows, columns, and boxes—the structural units of Sudoku's constraints—rather than representing the board cell by cell. Additionally, they identified a 'naked-single circuit': dedicated neurons in the final MLP layer that detect when only one digit remains possible for a cell and reliably promote that digit. This demonstrates that emergent world models in transformers are shaped by the underlying task structure.

Key facts

Study published on arXiv with ID 2605.18847
8-layer transformer trained on Sudoku solving traces
Model builds substructure world model organized by rows, columns, and boxes
Naked-single circuit found in final MLP layer
Dedicated neurons detect single possible digit per cell
World model geometry shaped by constraint algebra
Not by surface presentation of the board
Mechanistic analysis of internal computation performed

Transformers Build Internal World Models Aligned with Sudoku's Constraint Structure

Key facts

Entities

Institutions

Sources