EAGLE: Multi-Agent VLM Consensus via Visual Evidence Alignment
A research paper on arXiv (2605.30698) proposes EAGLE (Evidence-Aligned Grounded muLti-agent rEasoning), a training-free method for multi-agent vision-language model (VLM) consensus. The authors argue that answer-level agreement is insufficient for reliable visual question answering (VQA); aligned visual evidence—shared image regions across agents—is essential. EAGLE centers on evidence alignment rather than text-only discussion, addressing a gap in existing multi-agent VQA approaches that adapt text-centric protocols. The work highlights that aggregating diverse perspectives via multi-agent collaboration can mitigate individual hallucinations, but prior methods ignore visual information alignment. EAGLE is presented as a solution to achieve trustworthy consensus in multimodal domains.
Key facts
- Paper arXiv:2605.30698 proposes EAGLE
- EAGLE is a training-free multi-agent VLM method
- Focuses on aligning visual evidence across agents
- Argues answer-level agreement is insufficient for VQA
- Addresses gap in multimodal multi-agent collaboration
- Aims to mitigate individual VLM hallucinations
- Contrasts with text-centric multi-agent protocols
- Published on arXiv as cross-type announcement
Entities
Institutions
- arXiv