Width Pruning Reveals Dichotomy: Instruction-Following Improves While Knowledge Tasks Degrade

ai-technology · 2026-05-07

A recent investigation into the structured width pruning of GLU-MLP layers, utilizing the Maximum Absolute Weight (MAW) criterion, uncovers a clear division in the performance of Llama-3.2 models. While pruning leads to a decrease in the expansion ratio, resulting in predictable declines in parametric knowledge tasks (MMLU, GSM8K) and perplexity metrics, it significantly enhances instruction-following abilities, with improvements ranging from +46% to +75% in IFEval for both 1B and 3B models. Additionally, multi-step reasoning (MUSR) remains strong. This finding contradicts the belief that pruning uniformly deteriorates performance. Seven configurations of expansion ratios were assessed across various benchmarks, highlighting its role as a crucial architectural element that selectively influences cognitive functions.

Key facts

Structured width pruning guided by MAW criterion applied to GLU-MLP layers
Expansion ratio reduction degrades parametric knowledge tasks (MMLU, GSM8K) and perplexity
Instruction-following improves by 46% to 75% in IFEval for Llama-3.2-1B and 3B models
Multi-step reasoning (MUSR) remains robust under pruning
Seven expansion ratio configurations evaluated
Benchmarks cover factual knowledge, math reasoning, language comprehension, instruction-following, truthfulness
Expansion ratio identified as critical architectural parameter
Pruning does not induce uniform degradation

Entities

—

Sources

arXiv cs.AI — 2026-05-07