GEASS: Training-Free Method Reduces Hallucination in Vision-Language Models

ai-technology · 2026-05-06

A new approach called GEASS (Gated Evidence-Aware Selective Steering) has been introduced by researchers to address object hallucination in Vision-Language Models (VLMs) without requiring training. Their findings indicate that simply incorporating self-generated captions can negatively affect performance, resulting in a nearly 10-point decrease in accuracy for Qwen2.5-VL-3B on HallusionBench. This decline can be attributed to two structural characteristics: captions influence the model's reasoning and word choices, and errors in captions are unevenly distributed—where omissions significantly outnumber fabrications, yet each fabrication has a greater individual impact. GEASS determines the extent to which the model utilizes the caption by gating it according to the confidence of the clean path and adjusting it based on entropy reduction. This technique is detailed in arXiv:2605.01733.

Key facts

GEASS is a training-free module for hallucination mitigation in VLMs.
Naively embedding self-generated captions can drop Qwen2.5-VL-3B accuracy on HallusionBench by nearly 10 points.
Caption errors are asymmetric: omissions outnumber fabrications, but fabrications have larger per-instance impact.
GEASS gates caption consumption per query based on clean path confidence and entropy reduction.
The research is published on arXiv with ID 2605.01733.

GEASS: Training-Free Method Reduces Hallucination in Vision-Language Models

Key facts

Entities

Institutions

Sources