UNLOCK: Training-Free Cross-Model Capability Transfer via Linear Subspace Alignment
A recent study published on arXiv presents the Master Key Hypothesis, which posits that the abilities of models are linked to specific directions within a low-dimensional latent subspace, allowing for transfer between models via linear alignment. The researchers have developed UNLOCK, a framework that requires neither training nor labels, which identifies a capability direction by comparing activations from source variants with and without the capability. This direction is then aligned with a target model using a low-rank linear transformation and utilized during inference. Tests on reasoning tasks, such as Chain-of-Thought (CoT) and mathematical reasoning, demonstrate significant enhancements across various model scales without the need for training. The paper can be found at arXiv:2604.06377.
Key facts
- The Master Key Hypothesis states model capabilities correspond to directions in a low-dimensional latent subspace.
- UNLOCK is a training-free and label-free framework for cross-model capability transfer.
- It extracts capability direction by contrasting activations between capability-present and capability-absent source variants.
- Alignment with target model uses a low-rank linear transformation.
- Experiments on reasoning behaviors include Chain-of-Thought (CoT) and mathematical reasoning.
- Improvements are demonstrated across model scales without training.
- The paper is published on arXiv with ID 2604.06377.
- The approach is applied at inference time to elicit specific behaviors.
Entities
Institutions
- arXiv