Width Pruning Reveals Dichotomy: Instruction-Following Improves While Knowledge Tasks Degrade
A recent investigation into the structured width pruning of GLU-MLP layers, utilizing the Maximum Absolute Weight (MAW) criterion, uncovers a clear division in the performance of Llama-3.2 models. While pruning leads to a decrease in the expansion ratio, resulting in predictable declines in parametric knowledge tasks (MMLU, GSM8K) and perplexity metrics, it significantly enhances instruction-following abilities, with improvements ranging from +46% to +75% in IFEval for both 1B and 3B models. Additionally, multi-step reasoning (MUSR) remains strong. This finding contradicts the belief that pruning uniformly deteriorates performance. Seven configurations of expansion ratios were assessed across various benchmarks, highlighting its role as a crucial architectural element that selectively influences cognitive functions.
Key facts
- Structured width pruning guided by MAW criterion applied to GLU-MLP layers
- Expansion ratio reduction degrades parametric knowledge tasks (MMLU, GSM8K) and perplexity
- Instruction-following improves by 46% to 75% in IFEval for Llama-3.2-1B and 3B models
- Multi-step reasoning (MUSR) remains robust under pruning
- Seven expansion ratio configurations evaluated
- Benchmarks cover factual knowledge, math reasoning, language comprehension, instruction-following, truthfulness
- Expansion ratio identified as critical architectural parameter
- Pruning does not induce uniform degradation
Entities
—