LipB-ViT: Bayesian Vision Transformer for Label Noise
A new architecture-agnostic Lipschitz-constant Bayesian header addresses label noise in supervised deep learning, particularly semantically proximal classification errors. Integrated with vision transformers, it forms the bi-Lipschitz-constrained Bayesian Vision Transformer (LipB-ViT). Unlike conventional Bayesian layers, it enforces spectral normalization on both mean and log-variance of variational weights, promoting calibrated predictive uncertainty and reducing noise amplification. A novel metric jointly captures uncertainty and confidence across misclassification rates, and an adaptive arithmetic-mean fusion scheme combines feature-space proximity with the Bayesian header. The approach is validated on vision transformer backbones, showing improved robustness to structured label noise.
Key facts
- Label noise is a critical bottleneck for supervised deep learning generalization.
- Errors are often structured rather than random.
- Standard robust training methods fail on semantically proximal classification errors.
- The approach is architecture-agnostic and integrates with feature extractors like vision transformers.
- LipB-ViT enforces spectral normalization on mean and log-variance of variational weights.
- A novel metric jointly captures uncertainty and confidence across misclassification rates.
- An adaptive arithmetic-mean fusion scheme combines feature-space proximity with the Bayesian header.
- The work is published on arXiv with ID 2605.05908.
Entities
Institutions
- arXiv