ARTFEED — Contemporary Art Intelligence

AVES-DPO: Self-Corrected Preference Learning Reduces Hallucinations in LVLMs

other · 2026-04-29

Researchers propose AVES-DPO (Alignment via VErified Self-correction DPO), a framework to mitigate hallucinations in Large Vision-Language Models (LVLMs). Unlike existing preference learning methods that rely on proprietary models—causing distributional mismatch—AVES-DPO uses the model's own intrinsic knowledge to generate in-distribution preference pairs. A consensus-based verification mechanism diagnoses diverse hallucinations and guides the model to self-correct. Experiments show AVES-DPO surpasses baselines in hallucination mitigation using only 5.2k samples. The work is published on arXiv.

Key facts

  • AVES-DPO stands for Alignment via VErified Self-correction DPO
  • Framework addresses distributional mismatch in preference learning
  • Uses consensus-based verification to diagnose hallucinations
  • Model self-corrects to generate preference pairs
  • Requires only 5.2k samples
  • Surpasses existing baselines in hallucination mitigation
  • Published on arXiv under Computer Science > Artificial Intelligence
  • Submission history available on arXiv

Entities

Institutions

  • arXiv

Sources