LLM Reasoning as Dynamic Decoding State via Entropy Phase Transitions
A new arXiv paper (2605.22873) investigates when chain-of-thought (CoT) reasoning is beneficial for large language models. The authors find that CoT often yields marginal or negative gains on factual and open-ended tasks while increasing token consumption. They propose that reasoning is a dynamic decoding state, signaled by early-stage entropy dynamics: tasks benefiting from CoT show consistent entropy reduction, while others exhibit unstable or increasing patterns. This is interpreted as a phase transition from high-entropy exploration to low-entropy structured reasoning. The paper introduces EDRM (Entropy-Driven Reasoning Modulation) to adaptively apply CoT based on entropy signals.
Key facts
- Chain-of-thought reasoning often provides marginal or negative gains on factual and open-ended tasks.
- LLM reasoning is a dynamic decoding state that emerges during generation.
- Early-stage entropy dynamics reliably signal whether CoT is beneficial.
- Tasks benefiting from CoT exhibit consistent entropy reduction.
- Tasks not benefiting from CoT display unstable or increasing entropy patterns.
- The behavior is interpreted as a phase-transition-like shift from high-entropy exploration to low-entropy structured reasoning.
- The paper proposes EDRM (Entropy-Driven Reasoning Modulation) to adaptively apply CoT.
- The study is published on arXiv with ID 2605.22873.
Entities
Institutions
- arXiv