Looped Transformers: Fixed-Point Framework for Test-Time Scaling
A novel theoretical framework examines the stability and generalization aspects of looped transformer architectures, which offer potential for scaling compute during testing by focusing on more challenging problems. This research presents a fixed-point analysis across three dimensions: reachability, input-dependence, and geometry. The findings demonstrate that looped networks lacking recall have countable fixed points and fail to achieve significant input-dependence across any spectral regime. Conversely, incorporating recall with outer normalization creates a reliable environment where fixed points are reachable, locally smooth concerning input, and supported by stable backpropagation. Additionally, single-layer looped transformers were tested on chess, sudoku, and prefix-sums tasks. The paper can be found on arXiv with ID 2604.15259.
Key facts
- Looped transformers promise test-time compute scaling by spending more iterations on harder problems.
- A fixed-point based framework analyzes looped architectures along three axes: reachability, input-dependence, and geometry.
- Looped networks without recall have countable fixed points and cannot achieve strong input-dependence at any spectral regime.
- Recall combined with outer normalization produces a regime with reachable, locally smooth fixed points and stable backpropagation.
- Empirical training of single-layer looped transformers was performed on chess, sudoku, and prefix-sums tasks.
- The paper is titled 'Stability and Generalization in Looped Transformers'.
- The paper is available on arXiv under ID 2604.15259.
- The study addresses whether looped architectures can extrapolate to harder problems at test time rather than memorize training-specific solutions.
Entities
Institutions
- arXiv