New Benchmark Diagnoses Pixel-Grounding Hallucinations in VLMs
Researchers have introduced Counterfactual Segmentation Reasoning (CSR), a task designed to diagnose and mitigate pixel-grounding hallucinations in segmentation Vision-Language Models (VLMs). These models often produce masks for incorrect or nonexistent objects, a problem overlooked by existing text- or label-based evaluations. To support CSR, the team curated HalluSegBench, the first large-scale benchmark for evaluating referring and reasoning expression segmentation hallucinations. The benchmark uses counterfactual images to test whether a model can correctly segment a referenced object in a factual image and abstain from segmentation in a counterfactual counterpart. This approach reveals vision-driven hallucinations, which are more challenging and prevalent than previously recognized. The work is published on arXiv under the title "Counterfactual Segmentation Reasoning: Diagnosing and Mitigating Pixel-Grounding Hallucination" (arXiv:2506.21546).
Key facts
- Counterfactual Segmentation Reasoning (CSR) is a new task to diagnose pixel-grounding hallucinations.
- HalluSegBench is the first large-scale benchmark for segmentation hallucinations.
- Existing evaluations rely on text- or label-based perturbations and overlook spatial footprint.
- CSR requires models to segment in factual images and abstain in counterfactual counterparts.
- Vision-driven hallucinations are more challenging and prevalent than previously thought.
- The research is published on arXiv with ID 2506.21546.
- The work addresses pixel-grounding hallucinations in segmentation VLMs.
- The benchmark uses counterfactual images to test model robustness.
Entities
Institutions
- arXiv