CHASD: New Method Reduces Hallucinations in Vision-Language Models
A new framework called Contrastive Hallucination-Aware Step-wise Decoding (CHASD) has been introduced by researchers to minimize object hallucinations in Large Vision-Language Models (LVLMs) without the need for training. Hallucinations arise when language biases overshadow inadequate or misaligned visual data. Current contrastive decoding approaches either implement global perturbations or activate a negative branch at each step, which may disrupt valuable visual information. The researchers noted that the risk of hallucinations is both transient and specific to tokens: visual attention varies among generated tokens, with some being produced confidently and not needing adjustment. CHASD offers 'calibration on demand,' applying contrastive decoding selectively when hallucination risks are elevated. This method utilizes an uncertainty-based strategy for intervention. The study is available on arXiv (2605.23344v1) and was noted as a cross-type submission.
Key facts
- CHASD stands for Contrastive Hallucination-Aware Step-wise Decoding.
- It is a training-free inference-time framework for LVLMs.
- Hallucinations in LVLMs are caused by language priors dominating visual evidence.
- Existing contrastive decoding methods use global perturbations or constant negative branches.
- Hallucination risks are transient and token-specific.
- CHASD performs calibration on demand based on uncertainty.
- The paper is available on arXiv with ID 2605.23344v1.
- The announcement type is cross.
Entities
Institutions
- arXiv