APQ and MAQEE: New Quantization Methods for Early-Exit Vision Transformers
A recent preprint on arXiv (2605.07317) presents Amortized-Precision Quantization (APQ) and Mutual Adaptive Quantization with Early Exiting (MAQEE) to tackle the challenges of instability in low-precision early-exit Vision Transformers (ViTs). Current quantization techniques rely on a static full-depth execution model, leading to inaccuracies when exit decisions are influenced by quantization noise. APQ offers a utilization-aware approach that examines layer-wise stochastic exposure to this noise, highlighting trade-offs between depth and precision. Meanwhile, MAQEE employs a bi-level strategy that optimizes both exit thresholds and bit-widths with explicit risk management, enhancing inference stability. This method achieves a more favorable Pareto frontier in the accuracy-efficiency balance, cutting BOPs by as much as 95% without sacrificing accuracy, surpassing robust baselines.
Key facts
- arXiv preprint 2605.07317 introduces APQ and MAQEE
- APQ is a utilization-aware formulation for quantization noise
- MAQEE jointly optimizes exit thresholds and bit-widths
- Method reduces BOPs by up to 95% while maintaining accuracy
- Addresses instability in low-precision early-exit ViTs
- Existing quantization methods assume static full-depth execution
- MAQEE establishes a superior Pareto frontier in accuracy-efficiency trade-off
Entities
Institutions
- arXiv