Angular Separation Loss Improves Multi-Label VCE Classification
A novel approach for detecting multi-label temporal events in video capsule endoscopy (VCE) tackles the severe class imbalance present in the Galar dataset. This technique integrates an Angular Separation Loss applied to class prototypes with a Biological State Machine temporal decoder. The foundation model utilized is BiomedCLIP, which merges biomedical vision and language. To enhance transient pathological signals while minimizing static temporal redundancy, three consecutive frames are combined using a Local Differencing Attention module. Additionally, an Anatomy Context Head refines pathological predictions based on soft anatomical activations, leveraging the known spatial relationships of gastrointestinal findings. Alongside the Angular Separation Loss, learnable text-feature prompts and prototype-based logit augmentation are trained to avoid prototype collapse by penalizing off-diagonal cosine similarity.
Key facts
- Framework targets multi-label temporal event detection in VCE.
- Addresses extreme class imbalance in the Galar dataset.
- Uses Angular Separation Loss on class prototypes.
- Employs Biological State Machine temporal decoder.
- Backbone is BiomedCLIP.
- Local Differencing Attention module fuses three consecutive frames.
- Anatomy Context Head conditions predictions on anatomical activations.
- Learnable text-feature prompts and prototype-based logit augmentation are used.
Entities
—