SHRED: Retain-Set-Free Unlearning for LLMs via Self-Distillation
A new machine unlearning method called SHRED (Self-distillation via High-surprisal-only Retain-set-free Entropy Demotion) has been proposed for large language models (LLMs). It aims to selectively remove memorized content—such as private data, copyrighted text, or hazardous knowledge—without requiring a retain set of curated examples, which existing methods typically need to prevent degradation of general model utility. SHRED operates in two stages: first, it identifies high-information tokens within a forget set instance that concentrate memorized knowledge, using per-token autoregressive probabilities; second, it applies self-distillation with logit demotion to those tokens. The method is designed to be retain-set-free, eliminating the extra data dependency that complicates deployment. The paper is available on arXiv under identifier 2605.07482.
Key facts
- SHRED stands for Self-distillation via High-surprisal-only Retain-set-free Entropy Demotion.
- It is a retain-set-free unlearning method for LLMs.
- The method targets removal of private data, copyrighted text, or hazardous knowledge.
- It does not require a retain set of curated examples.
- SHRED uses per-token autoregressive probabilities to identify high-information tokens.
- It applies self-distillation with logit demotion to those tokens.
- The paper is available on arXiv with identifier 2605.07482.
- The method addresses the data dependency issue in existing unlearning approaches.
Entities
Institutions
- arXiv