Machine Unlearning Blind Spots: Over-Unlearning and Relearning Attacks
A new paper on arXiv (2506.01318) identifies two critical blind spots in machine unlearning (MU): over-unlearning, which degrades retained data near the forget set, and prototypical relearning attacks that can resurrect forgotten knowledge with few samples. The authors propose an over-unlearning metric OU@epsilon to quantify collateral damage and introduce Spotter, a plug-and-play objective combining masked knowledge-distillation penalty to counter both issues in class-level unlearning.
Key facts
- Paper arXiv:2506.01318 addresses over-unlearning and relearning attacks in machine unlearning.
- Over-unlearning deteriorates retained data near the forget set.
- Prototypical Relearning Attack exploits per-class prototypes to restore pre-unlearning performance.
- OU@epsilon metric quantifies over-unlearning in regions proximal to the forget set.
- Spotter is a plug-and-play objective with masked knowledge-distillation penalty.
- Focus is on class-level unlearning.
- The paper was announced as a replace-cross on arXiv.
- The attack requires only a few samples of the forget class.
Entities
Institutions
- arXiv