DPPrefSyn: Differentially Private Synthetic Data for LLM Alignment
A new algorithm called DPPrefSyn has been introduced by researchers to create differentially private synthetic preference data, facilitating the privacy-conscious alignment of large language models (LLMs). This innovative approach is based on the Bradley-Terry preference model and the geometric characteristics of pairwise human preferences. Initially, it establishes a foundational preference model from private data, ensuring formal differential privacy. Subsequently, it uses this model along with public prompts to generate high-quality preference data. DPPrefSyn takes advantage of the linear structure found in per-cluster reward models to effectively represent diverse human preferences while safeguarding sensitive user inputs and evaluations. This research addresses privacy issues in post-training with real human preference data, which may contain confidential details. The findings are available on arXiv with the identifier 2605.30808.
Key facts
- DPPrefSyn is a novel algorithm for differentially private synthetic preference data generation.
- It is grounded in the Bradley-Terry preference model and geometric structure of pairwise data.
- The algorithm learns a preference model from private data with differential privacy guarantees.
- It uses public prompts to synthesize high-quality preference data.
- It exploits shared linear structure of per-cluster reward models.
- The work addresses privacy concerns in LLM post-training on human preference data.
- The paper is available on arXiv with ID 2605.30808.
- The approach aims to protect sensitive user prompts and human judgments.
Entities
Institutions
- arXiv