ARTFEED — Contemporary Art Intelligence

Anchor-Projected Representations Enable Cross-Model Behavioral Axis Transfer

ai-technology · 2026-05-12

A new framework called anchor-projection allows behavioral directions to be transferred across different large language model families without fine-tuning. The method maps hidden representations into a shared anchor coordinate space (ACS), where canonical directions are averaged and reconstructed into new models. Evaluated on five instruction-tuned families (Llama, Qwen, Mistral, Phi, and others) across ten behavioral axes, the approach shows tight alignment within the LQMP cluster, achieving 0.83 ten-way detection accuracy on held-out targets. The paper is published on arXiv as preprint 2605.09875.

Key facts

  • Anchor-projection framework maps hidden representations into a shared anchor coordinate space (ACS).
  • Behavioral directions from source models are projected into ACS and averaged into a canonical direction.
  • For a new model, the canonical direction is reconstructed using only anchor activations, without fine-tuning.
  • Evaluated on five instruction-tuned model families: Llama, Qwen, Mistral, Phi, and others.
  • Ten behavioral axes were tested.
  • Same-axis directions align tightly across the LQMP cluster (Llama, Qwen, Mistral, Phi) in ACS.
  • Held-out targets achieved 0.83 ten-way detection accuracy for the aligned LQMP cluster.
  • The paper is available on arXiv with ID 2605.09875.

Entities

Institutions

  • arXiv

Sources