Geometric Over-Alignment Causes Hallucinations in Vision-Language Models

ai-technology · 2026-05-12

A new study from arXiv (2605.08245) identifies geometric over-alignment as the root cause of hallucinations in decoder-based Vision-Language Models (VLMs). The researchers trace failures to an over-alignment of visual embeddings with the text manifold, injecting linguistic bias that overshadows visual evidence. This is the first quantitative characterization of the phenomenon, showing that bias concentrates in top principal components. Prior work either aggressively closes the modality gap or uses expensive black-box decoding, but none addresses the underlying geometric cause. The findings have implications for high-stakes applications like medical imaging and autonomous systems.

Key facts

Study investigates root causes of hallucinations in decoder-based VLMs.
Geometric over-alignment bridges modality gap by over-aligning visual embeddings with text manifold.
Linguistic bias systematically overshadows fine-grained visual evidence.
First quantitative characterization of over-alignment in VLMs.
Bias concentrates in top principal components.
Prior work uses aggressive gap-closing or black-box decoding, not geometric correction.
Implications for medical imaging and autonomous systems.
Paper available on arXiv with ID 2605.08245.

Geometric Over-Alignment Causes Hallucinations in Vision-Language Models

Key facts

Entities

Institutions

Sources