Reasoning Distillation Fails to Transmit Cognitive Structure in LLMs

ai-technology · 2026-04-25

A recent study released on arXiv indicates that large language models (LLMs) struggle to convey the cognitive framework of reasoning through reasoning distillation. Researchers examined the "Hán Dān Xué Bù" (Superficial Mimicry) hypothesis across 14 models. They discovered that teacher models, which were trained using reinforcement learning, align closely with human cognitive costs (correlation r=0.64). In contrast, distilled student models trained through Supervised Fine-Tuning (SFT) experience a "Functional Alignment Collapse" (r=0.34) and frequently perform worse than their pre-distillation benchmarks, a situation referred to as "Negative Transfer." The findings imply that SFT creates a "Cargo Cult" effect, where students mimic the superficial aspects of reasoning without grasping the teacher's adaptive resource allocation strategy. The full paper can be found at arXiv:2601.05019.

Key facts

Study tests Hán Dān Xué Bù hypothesis across 14 models
Teacher models show alignment with human cognitive costs (r=0.64)
Distilled students suffer Functional Alignment Collapse (r=0.34)
Distilled students often underperform pre-distillation baselines
SFT induces Cargo Cult effect in reasoning distillation
Paper published on arXiv with ID 2601.05019
Teacher models trained via reinforcement learning
Student models trained via Supervised Fine-Tuning (SFT)

Reasoning Distillation Fails to Transmit Cognitive Structure in LLMs

Key facts

Entities

Institutions

Sources