ReMem: A Reliable Benchmark for LVLM Unlearning
A recent investigation published on arXiv (2605.03759) reveals a significant issue in the existing unlearning benchmarks for Large Vision-Language Models (LVLMs): they do not guarantee that models first retain target information, leading to unreliable assessments of unlearning. The researchers pinpoint under-memorization and the multi-hop curse as fundamental problems. To tackle this, they propose ReMem (Reliable Multi-hop and Multi-image Memorization Benchmark), which promotes solid foundational learning through systematic data scaling, reasoning-aware QA pairs, and varied visual contexts. Additionally, a new Exposure metric measures the extent of information removal from the model's internal probability distribution. Findings demonstrate that ReMem offers a comprehensive framework for identifying failures in both learning and unlearning processes.
Key facts
- arXiv paper 2605.03759 identifies stage 1 failure in LVLM unlearning benchmarks
- Models fail to effectively memorize target information initially
- Under-memorization and multi-hop curse are root causes
- ReMem benchmark ensures robust foundational learning
- ReMem uses principled data scaling, reasoning-aware QA pairs, diverse visual contexts
- Novel Exposure metric quantifies information erasure depth
- Experiments demonstrate ReMem provides rigorous framework for diagnosing learning and unlearning failures
Entities
Institutions
- arXiv