GISP: A Global Pruning Method for Efficient LLMs

ai-technology · 2026-04-30

A new structured pruning method called GISP (Global Iterative Structured Pruning) improves efficiency of large language models (LLMs) without requiring fine-tuning. Unlike the dominant local paradigm which is task-agnostic and preserves perplexity but limits downstream gains, GISP uses global, loss-based importance scores with block-wise normalization to remove attention heads and MLP channels. It adopts an iterative schedule rather than one-shot pruning, stabilizing accuracy at higher sparsity and mitigating perplexity collapse. The method is post-training and aims to deliver compact, hardware-friendly architectures that capitalize on task-specific calibration signals. The research is presented in arXiv paper 2510.18030.

Key facts

GISP stands for Global Iterative Structured Pruning.
It removes attention heads and MLP channels.
Uses first-order, loss-based importance scores.
Employs block-wise normalization.
Adopts an iterative pruning schedule.
Aims to improve downstream task performance.
Operates post-training without fine-tuning.
Designed for large language models (LLMs).

Entities

—

Sources

arXiv cs.AI — 2026-04-29