Stage-wise Preference Optimization Reduces Hallucination in VLMs

ai-technology · 2026-05-20

A novel framework aims to mitigate hallucinations in vision-language models by creating specific preference pairs close to identified failure thresholds. This method confronts issues like unclear spatial orientation, relationships between objects, uncertainties in OCR, and misleading training with false premises. It generates hallucinated negatives using slightly altered yet visually inconsistent options, allowing Direct Preference Optimization (DPO) to more effectively distinguish between grounded reasoning and plausible hallucinations. The approach is implemented in stages, concentrating on particular types of hallucinations instead of general instruction-following data. Experimental results indicate enhanced performance in minimizing linguistically plausible responses that lack visual grounding.

Key facts

Hallucination remains a fundamental challenge in vision-language models (VLMs).
Autoregressive generation may produce physically inconsistent or visually ungrounded responses.
The proposed framework uses stage-wise preference optimization.
It constructs hallucination-focused preference pairs near known failure boundaries.
The framework emphasizes ambiguous spatial orientation, object relationships, OCR uncertainty, and adversarial false-premise training.
Hallucinated negatives are generated through minimally perturbed yet visually inconsistent alternatives.
Direct Preference Optimization (DPO) is used to separate grounded reasoning from plausible hallucination.
The approach is detailed in arXiv:2605.16411.

Stage-wise Preference Optimization Reduces Hallucination in VLMs

Key facts

Entities

Institutions

Sources