Attention Distraction Causes Hallucinations in MLLMs, New Algorithm Corrects It

ai-technology · 2026-05-26

A recent study published on arXiv indicates a connection between object hallucinations in multimodal large language models (MLLMs) and a phenomenon of attention distraction similar to that observed in humans. The researchers demonstrate that when attention is divided, humans suffer from reduced visual clarity and erroneous descriptions, while MLLMs show inconsistencies in spatial attention across multiple heads and a temporal decline in focus on image tokens during decoding. Theoretical findings suggest that such attention dispersion complicates models and undermines their classification generalization. To mitigate this issue, they introduce the Attention-Focused Approach for Improved Image Perception (AFIP), which enhances attention through cross-head enrichment and strengthens visual grounding with dynamic historical attention improvements.

Key facts

Paper published on arXiv with ID 2605.24602
Reveals link between object hallucinations in MLLMs and attention distraction
Attention distraction causes spatial inconsistency in multi-head attention
Temporal fading of attention to image tokens occurs during decoding
Attention dispersion increases model complexity and degrades classification generalization
Proposes AFIP algorithm to correct attention distraction
AFIP uses cross-head attention enrichment and dynamic historical attention enhancement

Attention Distraction Causes Hallucinations in MLLMs, New Algorithm Corrects It

Key facts

Entities

Institutions

Sources