ARTFEED — Contemporary Art Intelligence

New Benchmark Diagnoses Pixel-Grounding Hallucinations in VLMs

other · 2026-04-25

Researchers have introduced Counterfactual Segmentation Reasoning (CSR), a task designed to diagnose and mitigate pixel-grounding hallucinations in segmentation Vision-Language Models (VLMs). These models often produce masks for incorrect or nonexistent objects, a problem overlooked by existing text- or label-based evaluations. To support CSR, the team curated HalluSegBench, the first large-scale benchmark for evaluating referring and reasoning expression segmentation hallucinations. The benchmark uses counterfactual images to test whether a model can correctly segment a referenced object in a factual image and abstain from segmentation in a counterfactual counterpart. This approach reveals vision-driven hallucinations, which are more challenging and prevalent than previously recognized. The work is published on arXiv under the title "Counterfactual Segmentation Reasoning: Diagnosing and Mitigating Pixel-Grounding Hallucination" (arXiv:2506.21546).

Key facts

  • Counterfactual Segmentation Reasoning (CSR) is a new task to diagnose pixel-grounding hallucinations.
  • HalluSegBench is the first large-scale benchmark for segmentation hallucinations.
  • Existing evaluations rely on text- or label-based perturbations and overlook spatial footprint.
  • CSR requires models to segment in factual images and abstain in counterfactual counterparts.
  • Vision-driven hallucinations are more challenging and prevalent than previously thought.
  • The research is published on arXiv with ID 2506.21546.
  • The work addresses pixel-grounding hallucinations in segmentation VLMs.
  • The benchmark uses counterfactual images to test model robustness.

Entities

Institutions

  • arXiv

Sources