ARTFEED — Contemporary Art Intelligence

LoopQ: Loop-Aware Quantization for Recursive Transformer Models

other · 2026-05-20

A new quantization framework called LoopQ addresses the fragility of looped language models (LoopLMs) under post-training quantization (PTQ). LoopLMs improve parameter efficiency by reusing Transformer blocks recursively, but this reuse causes distribution shifts, state mismatches, and error accumulation during quantization. LoopQ introduces a shared quantized backbone with lightweight adaptations including activation scaling, selective transformation, cross-loop state alignment, and trajectory-aware optimization. Experiments across seven benchmarks show that under W4A4 quantization, LoopQ improves average downstream accuracy by 68.8%.

Key facts

  • LoopQ is a loop-aware PTQ framework for looped language models.
  • LoopLMs reuse Transformer blocks recursively for parameter efficiency.
  • Three challenges identified: distribution shift, state reuse, recursive error accumulation.
  • LoopQ uses activation scaling, selective transformation, cross-loop state alignment, and trajectory-aware optimization.
  • Under W4A4 quantization, LoopQ improves average downstream accuracy by 68.8%.
  • Experiments conducted across seven benchmarks.
  • The paper is available on arXiv with ID 2605.16343.
  • This is the first systematic study of quantization in LoopLMs.

Entities

Institutions

  • arXiv

Sources