ARTFEED — Contemporary Art Intelligence

MED-VRAG: Iterative Multimodal RAG for Medical QA

other · 2026-05-01

Researchers have unveiled a new system called MED-VRAG, which uses a unique method that combines different retrieval techniques to focus on analyzing images of PMC document pages instead of text converted by OCR. This innovative approach incorporates ColQwen2.5 patch-level page embeddings and a MapReduce LLM filter, enabling it to handle about 350,000 pages while keeping the initial retrieval time under 30 milliseconds through a specially designed index. A vision-language model further enhances the process by refining queries and collecting evidence over three rounds of reasoning, taking about 15.9 seconds per round, with a total of 47.8 seconds needed for all three rounds on 4xA100. The system's effectiveness is evaluated against four medical QA benchmarks: MedQA, MedMCQA, PubMedQA, and MMLU-M.

Key facts

  • MED-VRAG is an iterative multimodal RAG framework.
  • It retrieves and reasons over PMC document page images.
  • It uses ColQwen2.5 patch-level page embeddings.
  • It employs a sharded MapReduce LLM filter.
  • Scales to ~350K pages.
  • Stage-1 retrieval under 30 ms via coarse-to-fine index.
  • VLM iteratively refines query across up to 3 rounds.
  • Evaluated on MedQA, MedMCQA, PubMedQA, MMLU-M.

Entities

Institutions

  • arXiv

Sources