Chain-of-Thought Gains Come from Local Token Co-occurrence, Not Global Logic

publication · 2026-05-27

A recent preprint on arXiv (2605.26795) explores the reasons behind the enhanced accuracy of language models through chain-of-thought (CoT) prompting. The researchers redirect their attention from behaviors during generation to analyses at the probing stage: with a constant rationale provided, which textual characteristics influence the change in answers? They pinpoint two key factors contributing to this improvement. Firstly, a rationale that is globally shuffled still significantly surpasses the baseline without a rationale, indicating a notable lexical activation effect. Secondly, the structured text's added advantage stems more from the proximity of tokens than from the logical arrangement of sentences. Maintaining contiguous sequences of just 2–3 tokens recaptures most of the additional gain towards complete CoT performance. Control tests dismiss the possibility that explicit answer copying or full grammatical correctness are the main drivers.

Key facts

arXiv paper 2605.26795 examines chain-of-thought prompting at probe time.
Globally word-shuffled rationales outperform no-rationale baselines.
Short-range token adjacency (2–3 tokens) drives most CoT gains.
Sentence-level logical ordering contributes less than local co-occurrence.
Copying of explicit answer declarations or values is not the main cause.
Full grammatical realization is not required for the improvement.
The study uses a fixed rationale in context to isolate probe-time effects.
Lexical activation effect is identified as a complementary source of gain.

Chain-of-Thought Gains Come from Local Token Co-occurrence, Not Global Logic

Key facts

Entities

Institutions

Sources