ASRU: A New Framework for Machine Unlearning in Multimodal LLMs
Researchers have proposed ASRU, a controllable multimodal unlearning framework that addresses the challenge of removing sensitive cross-modal information from multimodal large language models (MLLMs) while preserving generation quality. Existing unlearning methods often evaluate effectiveness based solely on output deviations, leading to hallucinated or rigid responses. ASRU introduces generation quality as a core evaluation objective, first inducing refusal behavior through activation redirection, then optimizing refusal boundaries via a customized reward function. Experiments on Qwen3-VL show ASRU improves unlearning effectiveness by 24.6% on average while maintaining model utility. The paper is available on arXiv under ID 2605.15687.
Key facts
- ASRU is a controllable multimodal unlearning framework for MLLMs.
- It incorporates generation quality as a core evaluation objective.
- ASRU uses activation redirection to induce refusal behavior.
- A customized reward function optimizes fine-grained refusal boundaries.
- Experiments on Qwen3-VL show a 24.6% average improvement in unlearning effectiveness.
- The framework aims to balance target knowledge unlearning and model utility.
- Existing methods overlook generation quality after unlearning.
- The paper is published on arXiv with ID 2605.15687.
Entities
Institutions
- arXiv