New Attack Exploits LLM Quantization via Outlier Injection
A groundbreaking quantization-conditioned attack has been unveiled by researchers, demonstrating its ability to consistently provoke harmful actions in large language models (LLMs) through sophisticated quantization methods such as AWQ, GPTQ, and GGUF I-quants. This attack takes advantage of a common characteristic found in many contemporary quantization techniques: the persistence of large outlier weights during quantization. Prior attacks were limited to basic quantization approaches and were ineffective against more widely used methods. This innovative tactic enables an attacker to distribute a model that seems harmless in full precision but turns malicious once users apply quantization, presenting a serious security threat for the memory-efficient use of LLMs.
Key facts
- First quantization-conditioned attack effective on AWQ, GPTQ, and GGUF I-quants
- Exploits large outlier weights invariant under quantization
- Prior attacks limited to simpler quantization methods
- Adversary can release benign full-precision model that turns malicious after quantization
- Published on arXiv with ID 2605.15152v1
Entities
Institutions
- arXiv