LipB-ViT: Bayesian Vision Transformer for Label Noise

ai-technology · 2026-05-09

A new architecture-agnostic Lipschitz-constant Bayesian header addresses label noise in supervised deep learning, particularly semantically proximal classification errors. Integrated with vision transformers, it forms the bi-Lipschitz-constrained Bayesian Vision Transformer (LipB-ViT). Unlike conventional Bayesian layers, it enforces spectral normalization on both mean and log-variance of variational weights, promoting calibrated predictive uncertainty and reducing noise amplification. A novel metric jointly captures uncertainty and confidence across misclassification rates, and an adaptive arithmetic-mean fusion scheme combines feature-space proximity with the Bayesian header. The approach is validated on vision transformer backbones, showing improved robustness to structured label noise.

Key facts

Label noise is a critical bottleneck for supervised deep learning generalization.
Errors are often structured rather than random.
Standard robust training methods fail on semantically proximal classification errors.
The approach is architecture-agnostic and integrates with feature extractors like vision transformers.
LipB-ViT enforces spectral normalization on mean and log-variance of variational weights.
A novel metric jointly captures uncertainty and confidence across misclassification rates.
An adaptive arithmetic-mean fusion scheme combines feature-space proximity with the Bayesian header.
The work is published on arXiv with ID 2605.05908.

LipB-ViT: Bayesian Vision Transformer for Label Noise

Key facts

Entities

Institutions

Sources