ARTFEED — Contemporary Art Intelligence

LipB-ViT: Bayesian Vision Transformer for Label Noise

ai-technology · 2026-05-09

A new architecture-agnostic Lipschitz-constant Bayesian header addresses label noise in supervised deep learning, particularly semantically proximal classification errors. Integrated with vision transformers, it forms the bi-Lipschitz-constrained Bayesian Vision Transformer (LipB-ViT). Unlike conventional Bayesian layers, it enforces spectral normalization on both mean and log-variance of variational weights, promoting calibrated predictive uncertainty and reducing noise amplification. A novel metric jointly captures uncertainty and confidence across misclassification rates, and an adaptive arithmetic-mean fusion scheme combines feature-space proximity with the Bayesian header. The approach is validated on vision transformer backbones, showing improved robustness to structured label noise.

Key facts

  • Label noise is a critical bottleneck for supervised deep learning generalization.
  • Errors are often structured rather than random.
  • Standard robust training methods fail on semantically proximal classification errors.
  • The approach is architecture-agnostic and integrates with feature extractors like vision transformers.
  • LipB-ViT enforces spectral normalization on mean and log-variance of variational weights.
  • A novel metric jointly captures uncertainty and confidence across misclassification rates.
  • An adaptive arithmetic-mean fusion scheme combines feature-space proximity with the Bayesian header.
  • The work is published on arXiv with ID 2605.05908.

Entities

Institutions

  • arXiv

Sources