Unstructured Pruning Boosts LLM Test-Time Scaling
A new study challenges the assumption that pruning degrades reasoning in large language models (LLMs). While structured pruning—removing entire layer blocks—was previously shown to harm test-time scaling (TTS) performance, researchers at arXiv found that unstructured pruning, which removes only redundant or detrimental weights, can actually improve TTS. Experiments on two reasoning LLMs (s1.1-7B and Qwen3-8B) across four benchmarks showed that unstructured pruning consistently outperformed structured pruning and sometimes even surpassed the unpruned full-weight models. The findings suggest that careful weight removal can enhance efficiency without sacrificing reasoning capability.
Key facts
- Study revisits LLM pruning for test-time scaling.
- Structured pruning degrades TTS performance in reasoning LLMs.
- Unstructured pruning removes only redundant or detrimental weights.
- Experiments conducted on s1.1-7B and Qwen3-8B models.
- Four reasoning benchmarks were used.
- Unstructured pruning outperformed structured pruning.
- Unstructured pruning sometimes beats unpruned full-weight models.
- Published on arXiv with ID 2604.25098.
Entities
Institutions
- arXiv