ARTFEED — Contemporary Art Intelligence

Unstructured Pruning Boosts LLM Test-Time Scaling

ai-technology · 2026-04-30

A new study challenges the assumption that pruning degrades reasoning in large language models (LLMs). While structured pruning—removing entire layer blocks—was previously shown to harm test-time scaling (TTS) performance, researchers at arXiv found that unstructured pruning, which removes only redundant or detrimental weights, can actually improve TTS. Experiments on two reasoning LLMs (s1.1-7B and Qwen3-8B) across four benchmarks showed that unstructured pruning consistently outperformed structured pruning and sometimes even surpassed the unpruned full-weight models. The findings suggest that careful weight removal can enhance efficiency without sacrificing reasoning capability.

Key facts

  • Study revisits LLM pruning for test-time scaling.
  • Structured pruning degrades TTS performance in reasoning LLMs.
  • Unstructured pruning removes only redundant or detrimental weights.
  • Experiments conducted on s1.1-7B and Qwen3-8B models.
  • Four reasoning benchmarks were used.
  • Unstructured pruning outperformed structured pruning.
  • Unstructured pruning sometimes beats unpruned full-weight models.
  • Published on arXiv with ID 2604.25098.

Entities

Institutions

  • arXiv

Sources