Federated Preference Alignment with Gumbel-Softmax Prior

ai-technology · 2026-06-01

A new framework, Federated Variational Preference Alignment with Gumbel-Softmax Prior (FedVPA-GP), addresses the challenge of personalizing large language models (LLMs) in federated learning settings. Traditional federated learning aligns LLMs using a single reward model, which averages conflicting user preferences like helpfulness versus harmlessness. Variational Preference Learning (VPL) offers personalization but suffers from posterior collapse in decentralized settings due to local data scarcity and heterogeneity. FedVPA-GP introduces a Federated Mixture Prior that allows clients to use the aggregate population distribution as a dynamic prior, stabilizing variational inference. Additionally, an Orthogonal Loss explicitly enforces separation of diverse preferences. The framework aims to disentangle preferences without compromising privacy. The paper is available on arXiv under identifier 2605.30873.

Key facts

FedVPA-GP is a framework for personalized federated learning with LLMs.
Traditional FL aligns LLMs with a monolithic reward model, averaging conflicting preferences.
Variational Preference Learning (VPL) offers personalization but faces posterior collapse in decentralized settings.
Posterior collapse is driven by local data scarcity and heterogeneity.
FedVPA-GP introduces a Federated Mixture Prior using aggregate population distribution.
An Orthogonal Loss enforces separation of diverse preferences.
The framework preserves privacy while disentangling preferences.
The paper is published on arXiv with ID 2605.30873.

Federated Preference Alignment with Gumbel-Softmax Prior

Key facts

Entities

Institutions

Sources