Saliency-Aware Regularization Improves LLM Quantization Calibration
A new paper from arXiv introduces SARQC, a framework addressing generalization risk in post-training quantization (PTQ) for large language models (LLMs). Existing PTQ methods minimize layer-wise reconstruction error on limited calibration data, which can cause quantized weights to diverge from original weights and degrade downstream performance. SARQC adds a saliency-aware regularization term that encourages quantized weights to stay close to original weights, improving calibration. The framework unifies scale search and Gram-based methods under a regularized objective. The paper is available at https://arxiv.org/abs/2605.05693.
Key facts
- arXiv paper 2605.05693 introduces SARQC
- SARQC stands for Saliency-Aware Regularized Quantization Calibration
- PTQ is used to deploy LLMs under memory and latency constraints
- Existing PTQ methods minimize layer-wise reconstruction error on predetermined calibration data
- Limited calibration data can cause generalization risk and performance degradation
- SARQC adds a saliency-aware regularization term to the PTQ objective
- The regularization term encourages quantized weights to stay close to original weights
- The framework unifies scale search and Gram-based methods
Entities
Institutions
- arXiv