Retrieval-Guided Generation Improves Histopathology Caption Safety
A recent study published on arXiv (2605.00893) introduces retrieval-guided generation (RGG) as a safer option compared to generative vision-language models for captioning histopathology images. Instead of creating captions from scratch, RGG synthesizes them by summarizing expert descriptions from visually comparable cases, which minimizes hallucinations and unfounded diagnostic assertions. In tests on the ARCH dataset, RGG recorded a cosine similarity of approximately 0.60, outperforming MedGemma's ~0.47, with distinct confidence intervals. Reviews by pathologists indicated improved retention of morphology-related terminology and a reduction in unsupported diagnoses, although issues such as concept mixing and overly specific labeling were observed. This method presents a more transparent and dependable approach to medical image captioning.
Key facts
- arXiv paper 2605.00893 proposes retrieval-guided generation for histopathology captioning
- RGG summarizes expert text from visually similar cases instead of generating captions de novo
- On ARCH dataset, RGG achieved cosine similarity ~0.60 vs ~0.47 for MedGemma
- Confidence intervals were non-overlapping, indicating robust gain
- Pathologist-led review showed better preservation of morphology terminology
- Fewer unsupported diagnoses were found with RGG
- Failure modes included concept mixing and inherited over-specific labeling
- RGG offers a more transparent and reliable approach
Entities
Institutions
- arXiv