Two Failure Modes of LLM Quantization Identified: Signal Degradation vs Computation Collapse
A new arXiv paper (2604.19884) presents a systematic mechanistic analysis of Post-Training Quantization (PTQ) in Large Language Models (LLMs), revealing two qualitatively distinct failure modes when reducing precision to 2-bit. The first, Signal Degradation, preserves computational patterns but impairs information precision through cumulative error. The second, Computation Collapse, destroys key components in early layers, preventing correct information processing. While 4-bit quantization is widely considered optimal, 2-bit typically triggers a catastrophic performance cliff. The study demonstrates that targeted, training-free repair can mitigate Signal Degradation but remains ineffective for Computation Collapse.
Key facts
- Paper arXiv:2604.19884 analyzes LLM quantization failure modes.
- Two distinct failure modes identified: Signal Degradation and Computation Collapse.
- Signal Degradation preserves computational patterns but impairs precision via cumulative error.
- Computation Collapse destroys key components in early layers.
- 4-bit quantization is widely regarded as optimal trade-off.
- 2-bit quantization triggers a catastrophic performance cliff.
- Training-free repair can mitigate Signal Degradation.
- Training-free repair is ineffective for Computation Collapse.
Entities
Institutions
- arXiv