Unlearnability and Unlearning for Model Dememorization
A recent study published on arXiv (2605.11592) explores sophisticated techniques for model dememorization, such as availability poisoning (unlearnability) and machine unlearning, which seek to mitigate data exploitation in machine learning. Unlearnability incorporates subtle alterations into data prior to its release to diminish learnability, whereas unlearning eliminates information after training. The study reveals common weaknesses: both approaches experience shallow dememorization, resulting in misleading assertions about reduced data learnability or forgetting due to weight changes. Additionally, input perturbations can influence subsequent unlearning, and unlearning might unintentionally retrieve domain knowledge obscured by unlearnability. This research underscores the necessity for stronger protective measures.
Key facts
- arXiv paper 2605.11592 surveys unlearnability and machine unlearning.
- Unlearnability embeds perturbations into data before release.
- Unlearning removes information from models post-training.
- Both methods suffer from shallow dememorization.
- Weight perturbations can cause falsely claimed forgetting.
- Input perturbations may affect downstream unlearning.
- Unlearning may recover domain knowledge hidden by unlearnability.
- The paper identifies shared vulnerabilities in dememorization.
Entities
Institutions
- arXiv