ARTFEED — Contemporary Art Intelligence

SIRA: Training-Free Method to Reduce LVLM Hallucinations

ai-technology · 2026-05-16

A new approach called SIRA (Shared-Prefix Internal Reconstruction of Attribution) has been introduced by researchers as a training-free internal contrastive decoding framework aimed at reducing hallucinations in large vision-language models (LVLMs). Unlike current contrastive decoding techniques that rely on comparing predictions from original images with externally altered visual inputs—which can lead to off-manifold artifacts and necessitate expensive additional forward passes—SIRA generates a counterfactual reference within the same LVLM. This is achieved by utilizing the staged information flow of multimodal transformers, allowing image and text tokens to interact via a shared prefix, thereby creating an aligned multimodal state. The methodology is detailed in the arXiv paper 2605.14621.

Key facts

  • SIRA is a training-free internal contrastive decoding framework.
  • It mitigates hallucinations in LVLMs without external perturbations.
  • It uses a shared prefix to form an aligned multimodal state.
  • It forks a counterfactual branch in later transformer layers.
  • The method avoids off-manifold artifacts and extra forward passes.
  • The paper is available on arXiv with ID 2605.14621.
  • The approach exploits staged information flow in multimodal transformers.
  • SIRA preserves prompt interpretation, decoding history, positional structure, and early visual grounding.

Entities

Institutions

  • arXiv

Sources