LoopQ: Loop-Aware Quantization for Recursive Transformer Models

other · 2026-05-20

A new quantization framework called LoopQ addresses the fragility of looped language models (LoopLMs) under post-training quantization (PTQ). LoopLMs improve parameter efficiency by reusing Transformer blocks recursively, but this reuse causes distribution shifts, state mismatches, and error accumulation during quantization. LoopQ introduces a shared quantized backbone with lightweight adaptations including activation scaling, selective transformation, cross-loop state alignment, and trajectory-aware optimization. Experiments across seven benchmarks show that under W4A4 quantization, LoopQ improves average downstream accuracy by 68.8%.

Key facts

LoopQ is a loop-aware PTQ framework for looped language models.
LoopLMs reuse Transformer blocks recursively for parameter efficiency.
Three challenges identified: distribution shift, state reuse, recursive error accumulation.
LoopQ uses activation scaling, selective transformation, cross-loop state alignment, and trajectory-aware optimization.
Under W4A4 quantization, LoopQ improves average downstream accuracy by 68.8%.
Experiments conducted across seven benchmarks.
The paper is available on arXiv with ID 2605.16343.
This is the first systematic study of quantization in LoopLMs.

LoopQ: Loop-Aware Quantization for Recursive Transformer Models

Key facts

Entities

Institutions

Sources