SHRED: Retain-Set-Free Unlearning for LLMs via Self-Distillation

ai-technology · 2026-05-11

A new machine unlearning method called SHRED (Self-distillation via High-surprisal-only Retain-set-free Entropy Demotion) has been proposed for large language models (LLMs). It aims to selectively remove memorized content—such as private data, copyrighted text, or hazardous knowledge—without requiring a retain set of curated examples, which existing methods typically need to prevent degradation of general model utility. SHRED operates in two stages: first, it identifies high-information tokens within a forget set instance that concentrate memorized knowledge, using per-token autoregressive probabilities; second, it applies self-distillation with logit demotion to those tokens. The method is designed to be retain-set-free, eliminating the extra data dependency that complicates deployment. The paper is available on arXiv under identifier 2605.07482.

Key facts

SHRED stands for Self-distillation via High-surprisal-only Retain-set-free Entropy Demotion.
It is a retain-set-free unlearning method for LLMs.
The method targets removal of private data, copyrighted text, or hazardous knowledge.
It does not require a retain set of curated examples.
SHRED uses per-token autoregressive probabilities to identify high-information tokens.
It applies self-distillation with logit demotion to those tokens.
The paper is available on arXiv with identifier 2605.07482.
The method addresses the data dependency issue in existing unlearning approaches.

SHRED: Retain-Set-Free Unlearning for LLMs via Self-Distillation

Key facts

Entities

Institutions

Sources