BiomedCLIP Weak Labels Show Sharp Crossover on Medical Imaging Benchmarks
A recent paper on arXiv examines the point at which weak supervision from vision-language models shifts from being beneficial to detrimental in the realm of medical imaging. The research utilizes weak labels generated by BiomedCLIP across three benchmarks—PCAM, ISIC, and NIH-CXR—and evaluates six downstream architectures with an 11x parameter variation. The findings indicate that the crossover occurs at around 100 samples for PCAM, 20-50 for ISIC, and 250-500 for NIH-CXR. Beyond these limits, weak labels can reduce AUC by as much as -0.10. This crossover point remains consistent across four out of five pretrained architectures, and a DenseNet analysis (2.5x parameters, same pretraining) confirms the theoretical expectations. This study translates traditional noisy-label theory into a practical framework for contemporary foundation-model labelers.
Key facts
- Study calibrates noisy-label crossover for BiomedCLIP weak labels
- Three benchmarks used: PCAM, ISIC, NIH-CXR
- Six downstream architectures tested across 11x parameter range
- Crossover at ~100 samples on PCAM, 20-50 on ISIC, 250-500 on NIH-CXR
- Weak labels above crossover degrade AUC by up to -0.10
- Crossover location architecture-invariant for four of five pretrained architectures
- DenseNet sweep (2.5x parameters, identical pretraining) supports theory
- Turns theoretical prediction into instance-level statement for foundation-model labelers
Entities
Institutions
- arXiv