Federated Alignment Framework for Heterogeneous Vision-Language Models
A novel federated alignment framework named MoR (Mixture-of-Rewards) has been introduced to tackle the difficulties associated with training Vision-Language Models (VLMs) amidst significant model and data diversity. This framework integrates GRPO (Group Relative Policy Optimization) with a Mixture-of-Rewards strategy, facilitating decentralized training without the need for direct exchanges of parameters or data. Within MoR, each client independently develops a reward model based on local preference annotations, which captures unique evaluation signals while ensuring privacy. This technique is especially pertinent in privacy-sensitive sectors like healthcare and finance, where centralized training is not viable due to data-sharing limitations. The research is available on arXiv under identifier 2605.03426.
Key facts
- MoR combines GRPO with Mixture-of-Rewards for heterogeneous VLMs
- Each client locally trains a reward model from local preference annotations
- Eliminates direct parameter or data exchange
- Addresses extreme model and data heterogeneity
- Applicable to privacy-sensitive domains like healthcare and finance
- Published on arXiv with identifier 2605.03426
- Federated alignment framework for decentralized training
- Preference-based collaboration replaces parameter aggregation
Entities
Institutions
- arXiv