SIRA: Training-Free Method to Reduce LVLM Hallucinations

ai-technology · 2026-05-16

A new approach called SIRA (Shared-Prefix Internal Reconstruction of Attribution) has been introduced by researchers as a training-free internal contrastive decoding framework aimed at reducing hallucinations in large vision-language models (LVLMs). Unlike current contrastive decoding techniques that rely on comparing predictions from original images with externally altered visual inputs—which can lead to off-manifold artifacts and necessitate expensive additional forward passes—SIRA generates a counterfactual reference within the same LVLM. This is achieved by utilizing the staged information flow of multimodal transformers, allowing image and text tokens to interact via a shared prefix, thereby creating an aligned multimodal state. The methodology is detailed in the arXiv paper 2605.14621.

Key facts

SIRA is a training-free internal contrastive decoding framework.
It mitigates hallucinations in LVLMs without external perturbations.
It uses a shared prefix to form an aligned multimodal state.
It forks a counterfactual branch in later transformer layers.
The method avoids off-manifold artifacts and extra forward passes.
The paper is available on arXiv with ID 2605.14621.
The approach exploits staged information flow in multimodal transformers.
SIRA preserves prompt interpretation, decoding history, positional structure, and early visual grounding.

SIRA: Training-Free Method to Reduce LVLM Hallucinations

Key facts

Entities

Institutions

Sources