DO-Bench Diagnoses Object Hallucination in VLMs
A new diagnostic benchmark called DO-Bench has been developed by researchers to pinpoint the underlying causes of object-level hallucination in vision-language models (VLMs). Unlike traditional benchmarks that emphasize overall accuracy, DO-Bench distinguishes whether mistakes stem from perceptual constraints or contextual textual priors through structured multimodal interventions. It examines two aspects: the Prior Override dimension, which enhances contextual textual priors while keeping visual evidence unchanged to evaluate resistance to prior influences, and the Perception-Limited dimension, which progressively improves visual evidence from comprehensive scenes to specific object crops to gauge perceptual grounding strength. This dual approach seeks to elucidate the fundamental failure mechanisms in binary object existence verification, a key reliability issue for VLMs. The benchmark is detailed in arXiv:2604.22822v1.
Key facts
- DO-Bench is a controlled diagnostic benchmark for object-level hallucination in VLMs.
- It isolates errors from perceptual limitations versus contextual textual priors.
- The Prior Override dimension tests resistance to prior pressure.
- The Perception-Limited dimension measures perceptual grounding strength.
- It uses structured multimodal interventions.
- Existing benchmarks focus on aggregate accuracy.
- The benchmark addresses binary object existence verification.
- The research is described in arXiv:2604.22822v1.
Entities
Institutions
- arXiv