LLMs' Internal Reasoning as a Polylogue of Persona Vectors
A recent paper on arXiv (2605.09159) suggests that large language models (LLMs) represent behavioral characteristics through "persona vectors" within activation space. These vectors can be tracked in real-time during generation as a "polylogue," which reflects a series of alignments between these vectors and hidden states. Testing on four models with open weights indicates that polylogue features can predict performance on MMLU-Pro comparably to low-dimensional baselines, while still being interpretable. Additionally, this method proposes specific steering targets for adjusting latent directions at various stages of response, implemented as a paragraph-conditioned intervention that enhances accuracy.
Key facts
- arXiv paper 2605.09159
- LLMs encode behavioural traits as persona vectors
- Persona vectors are linear directions in activation space
- Polylogue is the time series of alignments between persona vectors and hidden activations
- Experiments on four open-weight models
- Polylogue features predict correctness on MMLU-Pro
- Competitive with low-dimensional activation baselines
- Paragraph-conditioned intervention improves accuracy
Entities
Institutions
- arXiv