Signed Entropy Integral Detects Mislabeled Images in Deep Learning
A novel technique for identifying incorrectly labeled images in training datasets employs a signed entropy integral (SEI) statistic, which reflects both the magnitude and temporal evolution of prediction entropy throughout training epochs. Samples with accurate labels exhibit a steady decline in entropy, whereas those with incorrect labels retain elevated entropy levels. This method is widely applicable to classification networks and proves especially effective with CLIP architectures. Tests conducted on four medical imaging datasets highlight its effectiveness in a field prone to labeling inaccuracies.
Key facts
- Mislabeled samples degrade deep network performance due to memorization of erroneous labels.
- Correctly labeled samples exhibit consistent entropy decrease during training.
- Mislabeled samples maintain relatively high entropy throughout training.
- Signed entropy integral (SEI) captures magnitude and temporal trend of prediction entropy.
- SEI is broadly applicable to classification networks.
- SEI is particularly effective with contrastive language-image pretraining (CLIP) architectures.
- Experiments conducted on four medical imaging datasets.
- Medical imaging is a domain particularly susceptible to mislabeling.
Entities
—