ARTFEED — Contemporary Art Intelligence

Pref-CTRL: Preference-Driven LLM Alignment via Representation Editing

ai-technology · 2026-04-29

Researchers propose Pref-CTRL, a novel test-time alignment method for large language models that uses a multi-objective value function trained on preference data to edit internal representations during inference. Unlike prior work RE-Control, which uses a single value function, Pref-CTRL better captures the pairwise structure of human preferences between candidate responses. The method outperforms RE-Control on two benchmark datasets and shows improved generalization on out-of-domain datasets. The source code is publicly available.

Key facts

  • Pref-CTRL is a test-time alignment method for LLMs.
  • It uses a multi-objective value function trained on preference data.
  • It edits internal representations during inference.
  • It outperforms RE-Control on two benchmark datasets.
  • It shows greater generalization on out-of-domain datasets.
  • The source code is available.
  • The paper is on arXiv with ID 2604.23543.
  • RE-Control uses a single value function and gradient-based editing.

Entities

Institutions

  • arXiv

Sources