GISP: A Global Pruning Method for Efficient LLMs
A new structured pruning method called GISP (Global Iterative Structured Pruning) improves efficiency of large language models (LLMs) without requiring fine-tuning. Unlike the dominant local paradigm which is task-agnostic and preserves perplexity but limits downstream gains, GISP uses global, loss-based importance scores with block-wise normalization to remove attention heads and MLP channels. It adopts an iterative schedule rather than one-shot pruning, stabilizing accuracy at higher sparsity and mitigating perplexity collapse. The method is post-training and aims to deliver compact, hardware-friendly architectures that capitalize on task-specific calibration signals. The research is presented in arXiv paper 2510.18030.
Key facts
- GISP stands for Global Iterative Structured Pruning.
- It removes attention heads and MLP channels.
- Uses first-order, loss-based importance scores.
- Employs block-wise normalization.
- Adopts an iterative pruning schedule.
- Aims to improve downstream task performance.
- Operates post-training without fine-tuning.
- Designed for large language models (LLMs).
Entities
—