DPPrefSyn: Differentially Private Synthetic Data for LLM Alignment

ai-technology · 2026-06-01

A new algorithm called DPPrefSyn has been introduced by researchers to create differentially private synthetic preference data, facilitating the privacy-conscious alignment of large language models (LLMs). This innovative approach is based on the Bradley-Terry preference model and the geometric characteristics of pairwise human preferences. Initially, it establishes a foundational preference model from private data, ensuring formal differential privacy. Subsequently, it uses this model along with public prompts to generate high-quality preference data. DPPrefSyn takes advantage of the linear structure found in per-cluster reward models to effectively represent diverse human preferences while safeguarding sensitive user inputs and evaluations. This research addresses privacy issues in post-training with real human preference data, which may contain confidential details. The findings are available on arXiv with the identifier 2605.30808.

Key facts

DPPrefSyn is a novel algorithm for differentially private synthetic preference data generation.
It is grounded in the Bradley-Terry preference model and geometric structure of pairwise data.
The algorithm learns a preference model from private data with differential privacy guarantees.
It uses public prompts to synthesize high-quality preference data.
It exploits shared linear structure of per-cluster reward models.
The work addresses privacy concerns in LLM post-training on human preference data.
The paper is available on arXiv with ID 2605.30808.
The approach aims to protect sensitive user prompts and human judgments.

DPPrefSyn: Differentially Private Synthetic Data for LLM Alignment

Key facts

Entities

Institutions

Sources