Retrieval-Guided Generation Improves Histopathology Caption Safety

ai-technology · 2026-05-06

A recent study published on arXiv (2605.00893) introduces retrieval-guided generation (RGG) as a safer option compared to generative vision-language models for captioning histopathology images. Instead of creating captions from scratch, RGG synthesizes them by summarizing expert descriptions from visually comparable cases, which minimizes hallucinations and unfounded diagnostic assertions. In tests on the ARCH dataset, RGG recorded a cosine similarity of approximately 0.60, outperforming MedGemma's ~0.47, with distinct confidence intervals. Reviews by pathologists indicated improved retention of morphology-related terminology and a reduction in unsupported diagnoses, although issues such as concept mixing and overly specific labeling were observed. This method presents a more transparent and dependable approach to medical image captioning.

Key facts

arXiv paper 2605.00893 proposes retrieval-guided generation for histopathology captioning
RGG summarizes expert text from visually similar cases instead of generating captions de novo
On ARCH dataset, RGG achieved cosine similarity ~0.60 vs ~0.47 for MedGemma
Confidence intervals were non-overlapping, indicating robust gain
Pathologist-led review showed better preservation of morphology terminology
Fewer unsupported diagnoses were found with RGG
Failure modes included concept mixing and inherited over-specific labeling
RGG offers a more transparent and reliable approach

Retrieval-Guided Generation Improves Histopathology Caption Safety

Key facts

Entities

Institutions

Sources