FundusGround: A Benchmark for Interpretable Ophthalmic VQA

other · 2026-05-23

FundusGround introduces a novel standard for clinically interpretable ophthalmic Visual Question Answering (VQA), focusing on spatially-grounded evidence of lesions. It employs a three-step process to gather 10,719 fundus images, which feature 15,595 carefully annotated lesions, all accurately positioned according to the Early Treatment Diabetic Retinopathy Study (ETDRS) grid to ensure anatomical consistency. This approach allows for standardized mapping to nine significant retinal areas. From this organized data, 72,706 questions are formulated in four different formats, enhancing the interpretability of AI applications in ophthalmology.

Key facts

FundusGround is a benchmark for clinically interpretable ophthalmic VQA
It uses spatially-grounded lesion evidence
10,719 fundus images with 15,595 annotated lesions
Lesions localized using ETDRS grid
Mapping to nine retinal regions
72,706 questions generated in four formats
Focuses on interpretability over answer accuracy
Aims to support clinical decision-making

FundusGround: A Benchmark for Interpretable Ophthalmic VQA

Key facts

Entities

Institutions

Sources