Unstructured Pruning Boosts LLM Test-Time Scaling

ai-technology · 2026-04-30

A new study challenges the assumption that pruning degrades reasoning in large language models (LLMs). While structured pruning—removing entire layer blocks—was previously shown to harm test-time scaling (TTS) performance, researchers at arXiv found that unstructured pruning, which removes only redundant or detrimental weights, can actually improve TTS. Experiments on two reasoning LLMs (s1.1-7B and Qwen3-8B) across four benchmarks showed that unstructured pruning consistently outperformed structured pruning and sometimes even surpassed the unpruned full-weight models. The findings suggest that careful weight removal can enhance efficiency without sacrificing reasoning capability.

Key facts

Study revisits LLM pruning for test-time scaling.
Structured pruning degrades TTS performance in reasoning LLMs.
Unstructured pruning removes only redundant or detrimental weights.
Experiments conducted on s1.1-7B and Qwen3-8B models.
Four reasoning benchmarks were used.
Unstructured pruning outperformed structured pruning.
Unstructured pruning sometimes beats unpruned full-weight models.
Published on arXiv with ID 2604.25098.

Unstructured Pruning Boosts LLM Test-Time Scaling

Key facts

Entities

Institutions

Sources