Probing Persona-Dependent Preferences in Large Language Models

ai-technology · 2026-05-14

A recent study published on arXiv (2605.13339) examines how large language models (LLMs) represent preferences through various personas. Researchers utilized linear probes on the residual-stream activations of Gemma-3-27B and Qwen-3.5-122B to anticipate pairwise task selections. They discovered a true preference vector that consistently reflects model inclinations across different prompts and contexts. Notably, manipulating this vector on Gemma-3-27B directly influences pairwise decisions. Importantly, this preference framework is predominantly shared among different personas; a probe designed for a helpful assistant can effectively predict and guide the choices of distinctly different personas. These results imply that LLMs may operate on a unified internal preference system despite apparent behavioral variations.

Key facts

Study published on arXiv with ID 2605.13339
Models used: Gemma-3-27B and Qwen-3.5-122B
Linear probes trained on residual-stream activations
Preference vector identified that tracks choices across prompts
Steering along preference vector causally controls pairwise choice on Gemma-3-27B
Preference representation is shared across personas
Probe trained on helpful assistant predicts choices of other personas
Research explores internal implementation of persona-dependent preferences

Probing Persona-Dependent Preferences in Large Language Models

Key facts

Entities

Institutions

Sources