ARTFEED — Contemporary Art Intelligence

Probing Persona-Dependent Preferences in Large Language Models

ai-technology · 2026-05-14

A recent study published on arXiv (2605.13339) examines how large language models (LLMs) represent preferences through various personas. Researchers utilized linear probes on the residual-stream activations of Gemma-3-27B and Qwen-3.5-122B to anticipate pairwise task selections. They discovered a true preference vector that consistently reflects model inclinations across different prompts and contexts. Notably, manipulating this vector on Gemma-3-27B directly influences pairwise decisions. Importantly, this preference framework is predominantly shared among different personas; a probe designed for a helpful assistant can effectively predict and guide the choices of distinctly different personas. These results imply that LLMs may operate on a unified internal preference system despite apparent behavioral variations.

Key facts

  • Study published on arXiv with ID 2605.13339
  • Models used: Gemma-3-27B and Qwen-3.5-122B
  • Linear probes trained on residual-stream activations
  • Preference vector identified that tracks choices across prompts
  • Steering along preference vector causally controls pairwise choice on Gemma-3-27B
  • Preference representation is shared across personas
  • Probe trained on helpful assistant predicts choices of other personas
  • Research explores internal implementation of persona-dependent preferences

Entities

Institutions

  • arXiv

Sources