LoopQ: Loop-Aware Quantization for Recursive Transformer Models
A new quantization framework called LoopQ addresses the fragility of looped language models (LoopLMs) under post-training quantization (PTQ). LoopLMs improve parameter efficiency by reusing Transformer blocks recursively, but this reuse causes distribution shifts, state mismatches, and error accumulation during quantization. LoopQ introduces a shared quantized backbone with lightweight adaptations including activation scaling, selective transformation, cross-loop state alignment, and trajectory-aware optimization. Experiments across seven benchmarks show that under W4A4 quantization, LoopQ improves average downstream accuracy by 68.8%.
Key facts
- LoopQ is a loop-aware PTQ framework for looped language models.
- LoopLMs reuse Transformer blocks recursively for parameter efficiency.
- Three challenges identified: distribution shift, state reuse, recursive error accumulation.
- LoopQ uses activation scaling, selective transformation, cross-loop state alignment, and trajectory-aware optimization.
- Under W4A4 quantization, LoopQ improves average downstream accuracy by 68.8%.
- Experiments conducted across seven benchmarks.
- The paper is available on arXiv with ID 2605.16343.
- This is the first systematic study of quantization in LoopLMs.
Entities
Institutions
- arXiv