HalluScope Benchmark Reveals Textual Priors as Main Cause of LVLM Hallucinations

ai-technology · 2026-04-25

A new study from arXiv (2604.21911v1) introduces HalluScope, a benchmark designed to identify the primary causes of hallucinations in large vision-language models (LVLMs). The research finds that hallucinations are largely driven by excessive reliance on textual priors and background knowledge, particularly information introduced through textual instructions, rather than limitations of the vision backbone or language component dominance. To address this, the authors propose HalluVL-DPO, a fine-tuning framework that steers off-the-shelf LVLMs toward more visually grounded responses using preference optimization. The work provides a systematic analysis of hallucination factors and offers a mitigation strategy.

Key facts

HalluScope benchmark proposed to understand factors inducing hallucinations in LVLMs
Hallucinations stem from excessive reliance on textual priors and background knowledge
Textual instructions are a key source of hallucination-inducing priors
HalluVL-DPO framework fine-tunes LVLMs for visually grounded responses
HalluVL-DPO leverages preference optimization
Study published on arXiv with identifier 2604.21911v1
Research resolves ambiguity about relative importance of hallucination factors
Prior work attributed hallucinations to vision backbone limitations or language dominance

HalluScope Benchmark Reveals Textual Priors as Main Cause of LVLM Hallucinations

Key facts

Entities

Institutions

Sources