Federated Alignment Framework for Heterogeneous Vision-Language Models

ai-technology · 2026-05-07

A novel federated alignment framework named MoR (Mixture-of-Rewards) has been introduced to tackle the difficulties associated with training Vision-Language Models (VLMs) amidst significant model and data diversity. This framework integrates GRPO (Group Relative Policy Optimization) with a Mixture-of-Rewards strategy, facilitating decentralized training without the need for direct exchanges of parameters or data. Within MoR, each client independently develops a reward model based on local preference annotations, which captures unique evaluation signals while ensuring privacy. This technique is especially pertinent in privacy-sensitive sectors like healthcare and finance, where centralized training is not viable due to data-sharing limitations. The research is available on arXiv under identifier 2605.03426.

Key facts

MoR combines GRPO with Mixture-of-Rewards for heterogeneous VLMs
Each client locally trains a reward model from local preference annotations
Eliminates direct parameter or data exchange
Addresses extreme model and data heterogeneity
Applicable to privacy-sensitive domains like healthcare and finance
Published on arXiv with identifier 2605.03426
Federated alignment framework for decentralized training
Preference-based collaboration replaces parameter aggregation

Federated Alignment Framework for Heterogeneous Vision-Language Models

Key facts

Entities

Institutions

Sources