ARTFEED — Contemporary Art Intelligence

Federated Alignment Framework for Heterogeneous Vision-Language Models

ai-technology · 2026-05-07

A novel federated alignment framework named MoR (Mixture-of-Rewards) has been introduced to tackle the difficulties associated with training Vision-Language Models (VLMs) amidst significant model and data diversity. This framework integrates GRPO (Group Relative Policy Optimization) with a Mixture-of-Rewards strategy, facilitating decentralized training without the need for direct exchanges of parameters or data. Within MoR, each client independently develops a reward model based on local preference annotations, which captures unique evaluation signals while ensuring privacy. This technique is especially pertinent in privacy-sensitive sectors like healthcare and finance, where centralized training is not viable due to data-sharing limitations. The research is available on arXiv under identifier 2605.03426.

Key facts

  • MoR combines GRPO with Mixture-of-Rewards for heterogeneous VLMs
  • Each client locally trains a reward model from local preference annotations
  • Eliminates direct parameter or data exchange
  • Addresses extreme model and data heterogeneity
  • Applicable to privacy-sensitive domains like healthcare and finance
  • Published on arXiv with identifier 2605.03426
  • Federated alignment framework for decentralized training
  • Preference-based collaboration replaces parameter aggregation

Entities

Institutions

  • arXiv

Sources