New Benchmark Diagnoses Pixel-Grounding Hallucinations in VLMs

other · 2026-04-25

Researchers have introduced Counterfactual Segmentation Reasoning (CSR), a task designed to diagnose and mitigate pixel-grounding hallucinations in segmentation Vision-Language Models (VLMs). These models often produce masks for incorrect or nonexistent objects, a problem overlooked by existing text- or label-based evaluations. To support CSR, the team curated HalluSegBench, the first large-scale benchmark for evaluating referring and reasoning expression segmentation hallucinations. The benchmark uses counterfactual images to test whether a model can correctly segment a referenced object in a factual image and abstain from segmentation in a counterfactual counterpart. This approach reveals vision-driven hallucinations, which are more challenging and prevalent than previously recognized. The work is published on arXiv under the title "Counterfactual Segmentation Reasoning: Diagnosing and Mitigating Pixel-Grounding Hallucination" (arXiv:2506.21546).

Key facts

Counterfactual Segmentation Reasoning (CSR) is a new task to diagnose pixel-grounding hallucinations.
HalluSegBench is the first large-scale benchmark for segmentation hallucinations.
Existing evaluations rely on text- or label-based perturbations and overlook spatial footprint.
CSR requires models to segment in factual images and abstain in counterfactual counterparts.
Vision-driven hallucinations are more challenging and prevalent than previously thought.
The research is published on arXiv with ID 2506.21546.
The work addresses pixel-grounding hallucinations in segmentation VLMs.
The benchmark uses counterfactual images to test model robustness.

New Benchmark Diagnoses Pixel-Grounding Hallucinations in VLMs

Key facts

Entities

Institutions

Sources