Chain-of-Thought Reasoning Traces Found to Be Performative in LLMs
A recent investigation published on arXiv (2605.11746) questions the belief that chain-of-thought (CoT) reasoning in large language models is consistently aligned with the internal processes that generate answers. Researchers employed a Detect-Classify-Compare framework, utilizing an answer-commitment proxy validated through Patchscopes, tuned-lens probes, and causal direction ablation, to evaluate nine models across seven reasoning benchmarks. The results indicated that latent commitment and explicit answer arrival coincide on average only 61.9% of the time. The primary mismatch observed was confabulated continuation, which represented 58.0% of the mismatch events, where the answer-commitment proxy remains stable while the trace generates deliberative text without altering the committed answer. The study also features comparisons between architecture-matched Qwen2.5 and DeepSeek-R1-Distill.
Key facts
- Chain-of-thought traces are used to improve model capability and audit behavior.
- The study tests the assumption that visible trace syncs with answer-determining computation.
- A step-level Detect-Classify-Compare framework was built.
- Answer-commitment proxy cross-validated with Patchscopes, tuned-lens probes, and causal direction ablation.
- Nine models and seven reasoning benchmarks were tested.
- Latent commitment and explicit answer arrival align on only 61.9% of steps on average.
- Confabulated continuation is the dominant mismatch pattern at 58.0% of mismatch events.
- The committed answer does not change during confabulated continuation steps.
- Architecture-matched Qwen2.5 and DeepSeek-R1-Distill models were included.
Entities
Institutions
- arXiv