AVES-DPO: Self-Corrected Preference Learning Reduces Hallucinations in LVLMs

other · 2026-04-29

Researchers propose AVES-DPO (Alignment via VErified Self-correction DPO), a framework to mitigate hallucinations in Large Vision-Language Models (LVLMs). Unlike existing preference learning methods that rely on proprietary models—causing distributional mismatch—AVES-DPO uses the model's own intrinsic knowledge to generate in-distribution preference pairs. A consensus-based verification mechanism diagnoses diverse hallucinations and guides the model to self-correct. Experiments show AVES-DPO surpasses baselines in hallucination mitigation using only 5.2k samples. The work is published on arXiv.

Key facts

AVES-DPO stands for Alignment via VErified Self-correction DPO
Framework addresses distributional mismatch in preference learning
Uses consensus-based verification to diagnose hallucinations
Model self-corrects to generate preference pairs
Requires only 5.2k samples
Surpasses existing baselines in hallucination mitigation
Published on arXiv under Computer Science > Artificial Intelligence
Submission history available on arXiv

AVES-DPO: Self-Corrected Preference Learning Reduces Hallucinations in LVLMs

Key facts

Entities

Institutions

Sources