Saliency-Aware Regularization Improves LLM Quantization Calibration

other · 2026-05-09

A new paper from arXiv introduces SARQC, a framework addressing generalization risk in post-training quantization (PTQ) for large language models (LLMs). Existing PTQ methods minimize layer-wise reconstruction error on limited calibration data, which can cause quantized weights to diverge from original weights and degrade downstream performance. SARQC adds a saliency-aware regularization term that encourages quantized weights to stay close to original weights, improving calibration. The framework unifies scale search and Gram-based methods under a regularized objective. The paper is available at https://arxiv.org/abs/2605.05693.

Key facts

arXiv paper 2605.05693 introduces SARQC
SARQC stands for Saliency-Aware Regularized Quantization Calibration
PTQ is used to deploy LLMs under memory and latency constraints
Existing PTQ methods minimize layer-wise reconstruction error on predetermined calibration data
Limited calibration data can cause generalization risk and performance degradation
SARQC adds a saliency-aware regularization term to the PTQ objective
The regularization term encourages quantized weights to stay close to original weights
The framework unifies scale search and Gram-based methods

Saliency-Aware Regularization Improves LLM Quantization Calibration

Key facts

Entities

Institutions

Sources