ARTFEED — Contemporary Art Intelligence

ReMem: A Reliable Benchmark for LVLM Unlearning

ai-technology · 2026-05-07

A recent investigation published on arXiv (2605.03759) reveals a significant issue in the existing unlearning benchmarks for Large Vision-Language Models (LVLMs): they do not guarantee that models first retain target information, leading to unreliable assessments of unlearning. The researchers pinpoint under-memorization and the multi-hop curse as fundamental problems. To tackle this, they propose ReMem (Reliable Multi-hop and Multi-image Memorization Benchmark), which promotes solid foundational learning through systematic data scaling, reasoning-aware QA pairs, and varied visual contexts. Additionally, a new Exposure metric measures the extent of information removal from the model's internal probability distribution. Findings demonstrate that ReMem offers a comprehensive framework for identifying failures in both learning and unlearning processes.

Key facts

  • arXiv paper 2605.03759 identifies stage 1 failure in LVLM unlearning benchmarks
  • Models fail to effectively memorize target information initially
  • Under-memorization and multi-hop curse are root causes
  • ReMem benchmark ensures robust foundational learning
  • ReMem uses principled data scaling, reasoning-aware QA pairs, diverse visual contexts
  • Novel Exposure metric quantifies information erasure depth
  • Experiments demonstrate ReMem provides rigorous framework for diagnosing learning and unlearning failures

Entities

Institutions

  • arXiv

Sources