CoVUBench: New Benchmark for Copyright Unlearning in LVLMs

ai-technology · 2026-05-07

CoVUBench has been launched by researchers as the inaugural benchmark aimed at assessing the unlearning of copyrighted content in Large Vision-Language Models (LVLMs). These models, trained on extensive web data, may inadvertently memorize and reproduce copyrighted visuals like logos and characters. While machine unlearning presents a viable method for eliminating specific content after training, evaluating its success in multimodal contexts poses difficulties. CoVUBench fills this void by utilizing procedurally generated, legally compliant synthetic data that incorporates systematic visual variations, including compositional alterations and various domain representations, to effectively evaluate unlearning generalization. This benchmark seeks to deliver a thorough and realistic evaluation of LVLMs' ability to forget copyrighted content while preserving their performance.

Key facts

CoVUBench is the first benchmark for evaluating copyright unlearning in LVLMs.
LVLMs can memorize and regenerate copyrighted visual content like characters and logos.
Machine unlearning removes specific content after training.
Current evaluation methods lack robustness for multimodal settings.
CoVUBench uses procedurally generated synthetic data.
The data includes systematic visual variations such as compositional changes.
The benchmark evaluates unlearning generalization across diverse domain manifestations.
The synthetic data is legally safe.

Entities

—

Sources

arXiv cs.AI — 2026-05-06