ZeroUnlearn: Few-Shot Knowledge Unlearning in Large Language Models
The newly introduced framework, ZeroUnlearn, tackles the issue of erasing sensitive data from large language models without the need for expensive retraining. This approach reconceptualizes machine unlearning as a knowledge re-mapping challenge through model editing. It employs a few-shot technique to replace sensitive information with a neutral target state, effectively discarding original representations. By utilizing a multiplicative parameter update with a closed-form solution, it ensures representational orthogonality, facilitating efficient and precise unlearning. Additionally, a gradient-based variant broadens the framework for multi-sample unlearning. Experiments indicate its effectiveness in maintaining model utility while removing specific knowledge. The full paper can be found on arXiv.
Key facts
- ZeroUnlearn is a few-shot unlearning framework for large language models.
- It reformulates unlearning as a knowledge re-mapping problem via model editing.
- Sensitive inputs are overwritten to a neutral target state.
- Original representations are removed.
- Representational orthogonality is enforced via a multiplicative parameter update with closed-form solution.
- A gradient-based variant handles multi-sample unlearning.
- The method avoids expensive retraining or aggressive fine-tuning.
- The paper is published on arXiv with ID 2605.18879.
Entities
Institutions
- arXiv