LLMs' Internal Reasoning as a Polylogue of Persona Vectors

other · 2026-05-12

A recent paper on arXiv (2605.09159) suggests that large language models (LLMs) represent behavioral characteristics through "persona vectors" within activation space. These vectors can be tracked in real-time during generation as a "polylogue," which reflects a series of alignments between these vectors and hidden states. Testing on four models with open weights indicates that polylogue features can predict performance on MMLU-Pro comparably to low-dimensional baselines, while still being interpretable. Additionally, this method proposes specific steering targets for adjusting latent directions at various stages of response, implemented as a paragraph-conditioned intervention that enhances accuracy.

Key facts

arXiv paper 2605.09159
LLMs encode behavioural traits as persona vectors
Persona vectors are linear directions in activation space
Polylogue is the time series of alignments between persona vectors and hidden activations
Experiments on four open-weight models
Polylogue features predict correctness on MMLU-Pro
Competitive with low-dimensional activation baselines
Paragraph-conditioned intervention improves accuracy

LLMs' Internal Reasoning as a Polylogue of Persona Vectors

Key facts

Entities

Institutions

Sources