Hidden States Reveal Task-Relevant Information in Chain-of-Thought Reasoning
A recent research paper on arXiv (2604.23351) examines if chain-of-thought (CoT) tokens hold task-specific information beyond simple explanations. By employing activation patching on GSM8K, the researchers transferred hidden states at the token level from a CoT generation to a direct-answer response for the same query. The findings indicate that patching achieves greater accuracy than both direct-answer prompting and the initial CoT trace, suggesting that individual CoT tokens possess enough information to derive the correct answer, even if the original trace is flawed. Task-relevant information is found to be more abundant in correct CoT executions, unevenly spread across tokens, with a concentration in mid-to-late layers and appearing earlier in the reasoning process. The study also examines language tokens, including verbs.
Key facts
- Study uses activation patching on GSM8K
- Token-level hidden states transferred from CoT to direct-answer run
- Patching yields higher accuracy than direct-answer prompting and original CoT trace
- Task-relevant information more prevalent in correct CoT runs
- Information concentrates in mid-to-late layers
- Information appears earlier in reasoning trace
- Language tokens such as verbs are patched
- Published on arXiv with ID 2604.23351
Entities
Institutions
- arXiv