Task-Aware Pruning Improves Out-of-Distribution Model Performance
A recent study published on arXiv (2605.14738) explores task-aware layer pruning, a method advocated by TALE. The findings indicate that while pruning does not enhance performance on in-distribution (ID) data, it significantly boosts out-of-distribution (OOD) accuracy in both controlled polynomial regression tasks and large language models. The researchers demonstrate that OOD inputs generate layerwise norm and pairwise-distance profiles that differ from those of ID profiles, offering a geometric interpretation: each task creates a specific geometry, whereas OOD inputs present a warped version. Task-aware pruning effectively identifies and eliminates layers that contribute to or exacerbate this distortion, thereby altering OOD representational norms.
Key facts
- arXiv paper 2605.14738 investigates task-aware layer pruning
- Pruning shows no benefit on in-distribution data
- Pruning consistently improves out-of-distribution accuracy
- Study covers polynomial regression tasks and large language models
- OOD inputs induce deviant layerwise norm and pairwise-distance profiles
- Geometric explanation: task-adapted geometry is distorted by OOD inputs
- Pruning removes layers that create or amplify distortion
- Technique promoted by TALE
Entities
Institutions
- arXiv
- TALE