Critique-Driven Reasoning Alignment for LLM Personalization
A new approach to aligning Large Language Models (LLMs) with user preferences, called Critique-Driven Reasoning Alignment (CDRA), reframes alignment from reward-matching to structured reasoning. It introduces the DeepPref benchmark, a dataset of 3000 preference-query pairs across 20 topics, generated by a simulated multi-faceted cognitive council that produces critique-annotated reasoning chains. The method addresses the dual challenge of inferring users' deep implicit preferences (unstated goals, semantic context, risk tolerances) and performing defensive reasoning in ambiguous real-world scenarios. Current alignment methods produce superficial and brittle responses due to this cognitive gap. The work is published on arXiv under identifier 2510.11194.
Key facts
- CDRA reframes alignment as a structured reasoning process.
- DeepPref benchmark contains 3000 preference-query pairs.
- Pairs cover 20 topics.
- Data is curated by a simulated multi-faceted cognitive council.
- Council produces critique-annotated reasoning chains.
- Method addresses inference of deep implicit preferences.
- Method includes defensive reasoning for ambiguity.
- Published on arXiv with ID 2510.11194.
Entities
Institutions
- arXiv