ARTFEED — Contemporary Art Intelligence

Persona Vectors Form Early in LLM Pretraining

ai-technology · 2026-05-14

A new study from arXiv traces the formation of persona vectors—linear directions in internal activations corresponding to high-level behaviors like sycophancy—across the pretraining of OLMo-3-7B. These vectors form within 0.22% of pretraining and remain effective for steering fully post-trained instruct models. Although core representations emerge early, they continue to refine geometrically and semantically throughout training. The research addresses a gap in AI safety interpretability, as persona vectors are routinely used to inspect and steer model behavior.

Key facts

  • Persona vectors form within 0.22% of OLMo-3 pretraining.
  • Vectors remain effective for steering fully post-trained instruct models.
  • Core representations refine geometrically and semantically throughout training.
  • Study addresses interpretability gap in AI safety.
  • Persona vectors correspond to traits like evil or sycophancy.
  • Research uses OLMo-3-7B model.
  • Findings published on arXiv.
  • Vectors are linear directions in internal activations.

Entities

Institutions

  • arXiv

Sources