BiomedAP: Dual-Anchor Framework for Robust Medical Vision-Language Adaptation

ai-technology · 2026-05-18

A team of researchers has introduced BiomedAP, a dual-anchor framework that utilizes vision-informed gated cross-modal fusion to tackle the sensitivity of biomedical Vision-Language Models (VLMs) to variations in prompts. Current adaptation methods tend to optimize visual and textual prompts separately, resulting in inconsistent cross-modal alignment when faced with noisy clinical descriptions. BiomedAP promotes cohesive alignment through gated cross-modal fusion for layer-wise interaction, alongside a dual-anchor constraint that stabilizes prompts towards reliable semantic centroids derived from expert templates and few-shot examples. The primary goal of this framework is to enhance the robustness of few-shot medical diagnoses.

Key facts

Biomedical VLMs show promise in few-shot medical diagnosis but are fragile to prompt variations.
Existing frameworks optimize visual and textual prompts as independent streams.
Modality isolation leads to unstable cross-modal alignment in noisy clinical descriptions.
BiomedAP uses gated cross-modal fusion for dynamic noise regulation.
Dual-anchor constraint regularizes prompts toward stable semantic centroids.
High Anchors derived from expert templates.
Framework aims to improve robustness in clinical reality.
Proposed in arXiv:2605.15736.

Entities

—

Sources

arXiv cs.AI — 2026-05-18