Research reveals how vision-language models hallucinate by favoring text prompts over visual evidence

ai-technology · 2026-04-20

A new study examines how large vision-language models (VLMs) produce hallucinations by prioritizing textual prompts over actual visual content. Researchers investigated this failure in a controlled object-counting scenario where prompts exaggerated the number of objects in images. When object counts were low, models frequently corrected the overstatements, but as numbers grew, they increasingly conformed to the misleading prompts despite visual contradictions. Through mechanistic analysis of three VLMs, scientists identified a small group of attention heads responsible for these prompt-induced hallucinations. Ablating these specific heads reduced hallucinations by at least 40% without requiring additional training. The research demonstrates that these PIH-heads facilitate prompt copying in distinct ways across different models. This ablation technique enhanced correction toward visual evidence. The findings provide insights into the internal mechanisms driving these errors in vision-language models.

Key facts

Large vision-language models often hallucinate by favoring textual prompts over visual evidence
Study used controlled object-counting setting with prompts overstating object counts
Models corrected overestimations at low object counts but conformed to prompts at higher counts
Mechanistic analysis identified specific attention heads causing prompt-induced hallucinations
Ablation of these heads reduced hallucinations by at least 40% without additional training
PIH-heads mediate prompt copying in model-specific ways across different VLMs
Ablation increased correction toward visual evidence in the models
Research offers insights into internal mechanisms of vision-language model failures

Entities

—

Sources

arXiv cs.AI — 2026-04-20