Language Models Converge on Representations but Diverge on Reasoning
A new study posted on arXiv (2605.23315) explores the Platonic Representation Hypothesis, which suggests that large language models develop similar internal representations regardless of their training methods or structures. The researchers evaluated 16 models from 8 different families, with sizes between 1.5 billion and 72 billion parameters, tackling 800 reasoning tasks across various fields like math and science. They uncovered three main insights: first, models performed better on problems they struggled with (CKA = 0.897) than on those they solved (CKA = 0.830); second, while their pre-decision representations were aligned (CKA = 0.875), post-decision ones varied; lastly, representations tied to correct answers were less aligned than those irrelevant to the answers. This suggests that similar representations don't necessarily mean similar reasoning.
Key facts
- Study tests Platonic Representation Hypothesis on 16 LLMs from 8 families
- Models range from 1.5B to 72B parameters
- 800 reasoning problems across math, science, commonsense, truthfulness
- Difficulty inversion: convergence higher on failed problems (CKA=0.897) than solved (CKA=0.830)
- Pre-decision representations align (CKA=0.875), post-decision representations diverge
- Causally relevant representations less aligned than irrelevant ones
- Published on arXiv with ID 2605.23315
- Announce type: cross
Entities
Institutions
- arXiv