Junk DNA Hypothesis: Pruning Small Weights Impairs LLM Performance on Difficult Tasks
A new study challenges the prevailing belief that large language models (LLMs) contain significant redundancy in their pre-trained weights. The research, presented in arXiv:2310.02277, introduces the "Junk DNA Hypothesis" from a task-centric perspective. Contrary to the assumption that many parameters can be pruned without performance loss, the authors demonstrate that small-magnitude weights encode essential knowledge for tackling difficult downstream tasks. Pruning these weights leads to a monotonic performance drop across task difficulty levels. Furthermore, the study reveals that removing these seemingly inconsequential weights causes irreparable knowledge loss and performance degradation, even when downstream continual training is permitted. The findings underscore the critical role of small weights in LLMs and caution against aggressive pruning strategies.
Key facts
- Study titled 'Junk DNA Hypothesis' challenges redundancy assumptions in LLMs.
- Small-magnitude weights are crucial for difficult downstream tasks.
- Pruning these weights causes monotonic performance decline with task difficulty.
- Knowledge loss from pruning is irreparable even with continual training.
- Research adopts a task-centric angle on pre-trained weights.
- Paper published on arXiv with ID 2310.02277.
- Contradicts belief that LLMs contain significant redundancy.
- Evaluations show performance degradation across difficulty spectrum.
Entities
Institutions
- arXiv