New Research Exposes Informativeness Bias in Vision-Language Model Judges

ai-technology · 2026-04-22

A new study has revealed a serious flaw in using vision-language models (VLMs) to evaluate other VLMs. This research, found on arXiv under the ID 2604.17768v1, shows that these models often miss important details in images when making their assessments, choosing instead to focus on providing informative but potentially misleading answers. This issue, called "informativeness bias," significantly weakens automated evaluation methods. To tackle this challenge, the researchers proposed a new framework named BIRCH (Balanced Informativeness and CoRrectness with a Truthful AnCHor), which aligns candidate responses with actual image content. BIRCH emphasizes accuracy over just being informative, cutting down informativeness bias by up to 17% and boosting performance by as much as 9%. The study highlights critical flaws in current evaluation strategies for VLMs.

Key facts

Research paper arXiv:2604.17768v1 identifies "informativeness bias" in VLM-as-a-Judge systems
Vision-language models used as judges often ignore image content when evaluating answers
Judge models favor more informative answers even when they conflict with image content
This bias significantly undermines the reliability of automatic VLM evaluation
Researchers propose BIRCH (Balanced Informativeness and CoRrectness with a Truthful AnCHor) as a solution
BIRCH corrects inconsistencies in candidate answers before comparison
The new paradigm shifts focus from informativeness to image-grounded correctness
Experiments show BIRCH reduces bias by up to 17% and improves performance by up to 9%

New Research Exposes Informativeness Bias in Vision-Language Model Judges

Key facts

Entities

Institutions

Sources