Pref-CTRL: Preference-Driven LLM Alignment via Representation Editing

ai-technology · 2026-04-29

Researchers propose Pref-CTRL, a novel test-time alignment method for large language models that uses a multi-objective value function trained on preference data to edit internal representations during inference. Unlike prior work RE-Control, which uses a single value function, Pref-CTRL better captures the pairwise structure of human preferences between candidate responses. The method outperforms RE-Control on two benchmark datasets and shows improved generalization on out-of-domain datasets. The source code is publicly available.

Key facts

Pref-CTRL is a test-time alignment method for LLMs.
It uses a multi-objective value function trained on preference data.
It edits internal representations during inference.
It outperforms RE-Control on two benchmark datasets.
It shows greater generalization on out-of-domain datasets.
The source code is available.
The paper is on arXiv with ID 2604.23543.
RE-Control uses a single value function and gradient-based editing.

Pref-CTRL: Preference-Driven LLM Alignment via Representation Editing

Key facts

Entities

Institutions

Sources