Attention Head Imbalance Drives Modality-Conflict Hallucination in MLLMs
A new study on arXiv (2605.19250) investigates modality-conflict hallucination in multimodal large language models (MLLMs), where models prioritize incorrect textual premises over contradictory visual evidence. Using head-level causal analysis via path patching across five open-source MLLMs, researchers identified two groups of attention heads: hallucination-driving and hallucination-resisting. They found a consistent asymmetry: driving effects are broadly distributed with greater aggregate weight, while resisting effects concentrate in a few high-importance heads. Ablation experiments confirmed that distributed driving influence and localized resistance create an imbalanced routing structure that leads to hallucination. The study provides causal evidence for this imbalance, offering a mechanistic explanation for why visual evidence fails to prevail during generation.
Key facts
- Study examines modality-conflict hallucination in MLLMs
- Uses head-level causal analysis via path patching
- Analyzed five open-source MLLMs
- Identified hallucination-driving and hallucination-resisting attention heads
- Driving effects are broadly distributed and carry greater aggregate weight
- Resisting effects concentrate in a small number of high-importance heads
- Ablation experiments confirm opposing effects during generation
- Imbalanced routing structure underlies hallucination
Entities
Institutions
- arXiv