ReMem: A Reliable Benchmark for LVLM Unlearning

ai-technology · 2026-05-07

A recent investigation published on arXiv (2605.03759) reveals a significant issue in the existing unlearning benchmarks for Large Vision-Language Models (LVLMs): they do not guarantee that models first retain target information, leading to unreliable assessments of unlearning. The researchers pinpoint under-memorization and the multi-hop curse as fundamental problems. To tackle this, they propose ReMem (Reliable Multi-hop and Multi-image Memorization Benchmark), which promotes solid foundational learning through systematic data scaling, reasoning-aware QA pairs, and varied visual contexts. Additionally, a new Exposure metric measures the extent of information removal from the model's internal probability distribution. Findings demonstrate that ReMem offers a comprehensive framework for identifying failures in both learning and unlearning processes.

Key facts

arXiv paper 2605.03759 identifies stage 1 failure in LVLM unlearning benchmarks
Models fail to effectively memorize target information initially
Under-memorization and multi-hop curse are root causes
ReMem benchmark ensures robust foundational learning
ReMem uses principled data scaling, reasoning-aware QA pairs, diverse visual contexts
Novel Exposure metric quantifies information erasure depth
Experiments demonstrate ReMem provides rigorous framework for diagnosing learning and unlearning failures

ReMem: A Reliable Benchmark for LVLM Unlearning

Key facts

Entities

Institutions

Sources