Ghosted Layers: Training-Free Recovery for Layer-Pruned LLMs
Ghosted Layers is a recovery module that does not require training, designed for large language models (LLMs) that have experienced layer pruning. This process involves the removal of entire Transformer decoder blocks, which leads to a mismatch between the hidden state of the next surviving layer and the distribution it was originally trained on, resulting in notable performance decline. Ghosted Layers tackles this issue by addressing the boundary activation alignment problem, utilizing a small calibration set to derive a closed-form optimal linear operator that reconstructs the activation discrepancies caused by the pruned layers. Unlike existing methods that are confined to constrained solutions within limited operator subspaces, this approach achieves the unconstrained optimum of the alignment objective. Experiments across various LLM architectures and pruning techniques show consistent enhancements in accuracy and perplexity compared to previous training-free benchmarks.
Key facts
- Ghosted Layers is a training-free recovery module for layer-pruned LLMs.
- Layer pruning removes entire Transformer decoder blocks.
- Pruning causes a mismatch between hidden states and trained distributions.
- The method solves a boundary activation alignment problem.
- It derives a closed-form optimal linear operator from a small calibration set.
- The solution is the unconstrained optimum of the alignment objective.
- Existing methods are restricted to constrained solutions over limited operator subspaces.
- Experiments show consistent improvements in accuracy and perplexity over prior training-free baselines.
Entities
—