Polymorphism in Transformer Models Revealed and Resolved via Procrustes Rotation

ai-technology · 2026-05-26

A recent study published on arXiv (2605.24577) demonstrates that independently trained transformers can perform the same function even when their residual-stream bases are subjected to a uniform random rotation on SO(d_model), a concept referred to as polymorphism. This indicates that while the models execute the same functions, their internal coordinates are not comprehensible to each other. The challenge can be addressed through a single matrix multiplication for each model pair: an orthogonal Procrustes fit applied to a batch of activations allows for the transfer of sparse-autoencoder feature dictionaries and steering vectors between models without the need for retraining. Although the standard SAE universality metric shows a 98% cosine similarity in decoder-column matching across seeds, an SAE trained on one seed fails to accurately reconstruct the activations of another seed, indicating that while decoder columns align, the encoder operates from a rotated perspective. A single Procrustes rotation R is required to restore the reconstruction.

Key facts

Independently trained transformers compute same function in residual-stream bases differing by uniform random rotation on SO(d_model).
Phenomenon called polymorphism: same function, mutually unintelligible interior coordinates.
One matrix multiplication per model pair removes it: orthogonal Procrustes fit on single batch of activations.
Transfers sparse-autoencoder feature dictionaries and steering vectors between independently trained models with no retraining.
Phenomenon invisible to standard SAE universality metric.
Decoder-column cosine similarity matches across seeds at 98%.
SAE trained on one seed reconstructs another seed's activations at negative explained variance.
Single Procrustes rotation R restores reconstruction.

Polymorphism in Transformer Models Revealed and Resolved via Procrustes Rotation

Key facts

Entities

Institutions

Sources