Federated Multi-Label Prompt Tuning for Vision-Language Models

publication · 2026-05-28

A new paper on arXiv (2605.28347) introduces FedMPT, the first method specifically designed for federated multi-label recognition (MLR) using vision-language models (VLMs). The approach addresses the problem of overfitting to spurious label correlations when VLMs are adapted to private, heterogeneous client data in decentralized settings. By applying a causal model with front-door adjustment, FedMPT decouples the MLR process through intermediate variables that amplify oracle label co-occurrence, steering the model toward generalizable conditions to reduce erroneous label activation.

Key facts

Paper ID: arXiv:2605.28347
Title: FedMPT: Federated Multi-label Prompt Tuning of Vision-Language Models
First method for federated multi-label recognition
Uses causal model with front-door adjustment
Addresses overfitting to spurious label correlations
Decouples MLR via intermediate variables
Amplifies oracle label co-occurrence
Focuses on decentralized, heterogeneous client data

Federated Multi-Label Prompt Tuning for Vision-Language Models

Key facts

Entities

Institutions

Sources