BiomedCLIP Weak Labels Show Sharp Crossover on Medical Imaging Benchmarks

other · 2026-05-26

A recent paper on arXiv examines the point at which weak supervision from vision-language models shifts from being beneficial to detrimental in the realm of medical imaging. The research utilizes weak labels generated by BiomedCLIP across three benchmarks—PCAM, ISIC, and NIH-CXR—and evaluates six downstream architectures with an 11x parameter variation. The findings indicate that the crossover occurs at around 100 samples for PCAM, 20-50 for ISIC, and 250-500 for NIH-CXR. Beyond these limits, weak labels can reduce AUC by as much as -0.10. This crossover point remains consistent across four out of five pretrained architectures, and a DenseNet analysis (2.5x parameters, same pretraining) confirms the theoretical expectations. This study translates traditional noisy-label theory into a practical framework for contemporary foundation-model labelers.

Key facts

Study calibrates noisy-label crossover for BiomedCLIP weak labels
Three benchmarks used: PCAM, ISIC, NIH-CXR
Six downstream architectures tested across 11x parameter range
Crossover at ~100 samples on PCAM, 20-50 on ISIC, 250-500 on NIH-CXR
Weak labels above crossover degrade AUC by up to -0.10
Crossover location architecture-invariant for four of five pretrained architectures
DenseNet sweep (2.5x parameters, identical pretraining) supports theory
Turns theoretical prediction into instance-level statement for foundation-model labelers

BiomedCLIP Weak Labels Show Sharp Crossover on Medical Imaging Benchmarks

Key facts

Entities

Institutions

Sources