Language Models Converge on Representations but Diverge on Reasoning

publication · 2026-05-25

A new study posted on arXiv (2605.23315) explores the Platonic Representation Hypothesis, which suggests that large language models develop similar internal representations regardless of their training methods or structures. The researchers evaluated 16 models from 8 different families, with sizes between 1.5 billion and 72 billion parameters, tackling 800 reasoning tasks across various fields like math and science. They uncovered three main insights: first, models performed better on problems they struggled with (CKA = 0.897) than on those they solved (CKA = 0.830); second, while their pre-decision representations were aligned (CKA = 0.875), post-decision ones varied; lastly, representations tied to correct answers were less aligned than those irrelevant to the answers. This suggests that similar representations don't necessarily mean similar reasoning.

Key facts

Study tests Platonic Representation Hypothesis on 16 LLMs from 8 families
Models range from 1.5B to 72B parameters
800 reasoning problems across math, science, commonsense, truthfulness
Difficulty inversion: convergence higher on failed problems (CKA=0.897) than solved (CKA=0.830)
Pre-decision representations align (CKA=0.875), post-decision representations diverge
Causally relevant representations less aligned than irrelevant ones
Published on arXiv with ID 2605.23315
Announce type: cross

Language Models Converge on Representations but Diverge on Reasoning

Key facts

Entities

Institutions

Sources