Ultrasound VQA Enhanced by Active Zooming and Uncertainty Awareness

ai-technology · 2026-05-23

A new framework for ultrasound visual question answering (VQA) improves Vision-Language Model (VLM) performance by mimicking sonographers' cognitive workflow. The approach introduces a Zoom-then-Diagnose paradigm that interactively focuses on lesion regions before diagnosis, addressing the lack of structured lesion-focused reasoning in existing VLMs. Additionally, it incorporates uncertainty-aware rewards within the Group Relative Policy Optimization (GRPO) framework to account for the inherent subjectivity and ambiguity in medical annotations, rather than treating them as unbiased ground truths. This work, published as arXiv:2605.21652, targets suboptimal VLM performance in ultrasound by replicating the interactive search process of clinical practice.

Key facts

Proposes Zoom-then-Diagnose paradigm for lesion-focused reasoning
Uses uncertainty-aware rewards in GRPO framework
Addresses subjectivity in medical annotations
Targets ultrasound VQA performance improvement
Published as arXiv:2605.21652
Replicates sonographer's cognitive workflow

Ultrasound VQA Enhanced by Active Zooming and Uncertainty Awareness

Key facts

Entities

Institutions

Sources