ARTFEED — Contemporary Art Intelligence

Chain of Evidence: Visual Attribution for Iterative RAG

ai-technology · 2026-05-06

Researchers propose Chain of Evidence (CoE), a visual attribution framework for Iterative Retrieval-Augmented Generation (iRAG) that uses Vision-Language Models to reason directly over screenshots of retrieved documents. CoE addresses coarse-grained text-level citations and visual semantic loss from parsing visually rich documents like slides and PDFs. It outputs precise bounding boxes for evidence, eliminating format-specific parsing. The system is retriever-agnostic and aims to improve multi-hop question answering by preserving spatial logic and layout cues.

Key facts

  • Chain of Evidence (CoE) is a visual attribution framework for iRAG.
  • CoE uses Vision-Language Models to reason over document screenshots.
  • It addresses coarse-grained text citations and visual semantic loss.
  • CoE outputs precise bounding boxes for evidence.
  • It is retriever-agnostic and eliminates format-specific parsing.
  • The framework targets multi-hop question answering.
  • CoE preserves spatial logic and layout cues from visually rich documents.
  • The research is published on arXiv with ID 2605.01284.

Entities

Institutions

  • arXiv

Sources