Spectral Method Recovers Damaged Language Model Capabilities
Researchers propose DG-Hard, a post-hoc repair method for catastrophic forgetting in fine-tuned language models. The method uses only the pretrained checkpoint and its fine-tuned descendant, applying the Donoho-Gavish hard singular-value threshold to weight-delta matrices. DG-Hard recovers capabilities damaged by fine-tuning while preserving target-task gains and beneficial held-out improvements, treating the fine-tuning update as a low-rank task-aligned signal embedded in noise. The approach is detailed in arXiv:2605.20296.
Key facts
- DG-Hard is a spectral repair method for fine-tuning updates.
- It uses only the pretrained checkpoint and fine-tuned descendant.
- It applies Donoho-Gavish hard singular-value thresholding.
- It recovers damaged capabilities without retraining.
- It preserves target-task gains and beneficial held-out improvements.
- The method treats the fine-tuning update as low-rank signal plus noise.
- The research is described in arXiv:2605.20296.
- Catastrophic forgetting degrades capabilities not explicitly threatened by training data.
Entities
Institutions
- arXiv