MED-VRAG: Iterative Multimodal RAG for Medical QA

other · 2026-05-01

Researchers have unveiled a new system called MED-VRAG, which uses a unique method that combines different retrieval techniques to focus on analyzing images of PMC document pages instead of text converted by OCR. This innovative approach incorporates ColQwen2.5 patch-level page embeddings and a MapReduce LLM filter, enabling it to handle about 350,000 pages while keeping the initial retrieval time under 30 milliseconds through a specially designed index. A vision-language model further enhances the process by refining queries and collecting evidence over three rounds of reasoning, taking about 15.9 seconds per round, with a total of 47.8 seconds needed for all three rounds on 4xA100. The system's effectiveness is evaluated against four medical QA benchmarks: MedQA, MedMCQA, PubMedQA, and MMLU-M.

Key facts

MED-VRAG is an iterative multimodal RAG framework.
It retrieves and reasons over PMC document page images.
It uses ColQwen2.5 patch-level page embeddings.
It employs a sharded MapReduce LLM filter.
Scales to ~350K pages.
Stage-1 retrieval under 30 ms via coarse-to-fine index.
VLM iteratively refines query across up to 3 rounds.
Evaluated on MedQA, MedMCQA, PubMedQA, MMLU-M.

MED-VRAG: Iterative Multimodal RAG for Medical QA

Key facts

Entities

Institutions

Sources