Anchor-Projected Representations Enable Cross-Model Behavioral Axis Transfer

ai-technology · 2026-05-12

A new framework called anchor-projection allows behavioral directions to be transferred across different large language model families without fine-tuning. The method maps hidden representations into a shared anchor coordinate space (ACS), where canonical directions are averaged and reconstructed into new models. Evaluated on five instruction-tuned families (Llama, Qwen, Mistral, Phi, and others) across ten behavioral axes, the approach shows tight alignment within the LQMP cluster, achieving 0.83 ten-way detection accuracy on held-out targets. The paper is published on arXiv as preprint 2605.09875.

Key facts

Anchor-projection framework maps hidden representations into a shared anchor coordinate space (ACS).
Behavioral directions from source models are projected into ACS and averaged into a canonical direction.
For a new model, the canonical direction is reconstructed using only anchor activations, without fine-tuning.
Evaluated on five instruction-tuned model families: Llama, Qwen, Mistral, Phi, and others.
Ten behavioral axes were tested.
Same-axis directions align tightly across the LQMP cluster (Llama, Qwen, Mistral, Phi) in ACS.
Held-out targets achieved 0.83 ten-way detection accuracy for the aligned LQMP cluster.
The paper is available on arXiv with ID 2605.09875.

Anchor-Projected Representations Enable Cross-Model Behavioral Axis Transfer

Key facts

Entities

Institutions

Sources