ARTFEED — Contemporary Art Intelligence

Fully Looped Transformer Stabilizes Training Without Extra Parameters

ai-technology · 2026-05-20

A new paper on arXiv proposes the Fully Looped Transformer, a modification to the Looped Transformer architecture that addresses training instability. The instability arises from gradient oscillation and residual explosion when increasing loop iterations. The authors introduce two parameter-free modifications: a Fully Looped Architecture that distributes inter-loop signals across all layers to mitigate residual explosion, and Attention Injection that reuses existing attention mechanisms. This approach allows scaling performance through additional computation without increasing model size or context length, and enables dynamic adjustment of loop iterations at inference to balance performance and test-time compute. The paper is available at arXiv:2605.18797.

Key facts

  • arXiv:2605.18797
  • Looped Transformer suffers from training instability with increased loop iterations
  • Instability stems from gradient oscillation and residual explosion
  • Fully Looped Transformer introduces two parameter-free modifications
  • Fully Looped Architecture distributes inter-loop signals across all layers
  • Attention Injection reuses existing attention mechanisms
  • Loop iterations can be adjusted at inference
  • No increase in parameter count or context length

Entities

Institutions

  • arXiv

Sources