ARTFEED — Contemporary Art Intelligence

New Research Exposes Informativeness Bias in Vision-Language Model Judges

ai-technology · 2026-04-22

A new study has revealed a serious flaw in using vision-language models (VLMs) to evaluate other VLMs. This research, found on arXiv under the ID 2604.17768v1, shows that these models often miss important details in images when making their assessments, choosing instead to focus on providing informative but potentially misleading answers. This issue, called "informativeness bias," significantly weakens automated evaluation methods. To tackle this challenge, the researchers proposed a new framework named BIRCH (Balanced Informativeness and CoRrectness with a Truthful AnCHor), which aligns candidate responses with actual image content. BIRCH emphasizes accuracy over just being informative, cutting down informativeness bias by up to 17% and boosting performance by as much as 9%. The study highlights critical flaws in current evaluation strategies for VLMs.

Key facts

  • Research paper arXiv:2604.17768v1 identifies "informativeness bias" in VLM-as-a-Judge systems
  • Vision-language models used as judges often ignore image content when evaluating answers
  • Judge models favor more informative answers even when they conflict with image content
  • This bias significantly undermines the reliability of automatic VLM evaluation
  • Researchers propose BIRCH (Balanced Informativeness and CoRrectness with a Truthful AnCHor) as a solution
  • BIRCH corrects inconsistencies in candidate answers before comparison
  • The new paradigm shifts focus from informativeness to image-grounded correctness
  • Experiments show BIRCH reduces bias by up to 17% and improves performance by up to 9%

Entities

Institutions

  • arXiv

Sources