LLM Reasoning as Dynamic Decoding State via Entropy Phase Transitions

ai-technology · 2026-05-25

A new arXiv paper (2605.22873) investigates when chain-of-thought (CoT) reasoning is beneficial for large language models. The authors find that CoT often yields marginal or negative gains on factual and open-ended tasks while increasing token consumption. They propose that reasoning is a dynamic decoding state, signaled by early-stage entropy dynamics: tasks benefiting from CoT show consistent entropy reduction, while others exhibit unstable or increasing patterns. This is interpreted as a phase transition from high-entropy exploration to low-entropy structured reasoning. The paper introduces EDRM (Entropy-Driven Reasoning Modulation) to adaptively apply CoT based on entropy signals.

Key facts

Chain-of-thought reasoning often provides marginal or negative gains on factual and open-ended tasks.
LLM reasoning is a dynamic decoding state that emerges during generation.
Early-stage entropy dynamics reliably signal whether CoT is beneficial.
Tasks benefiting from CoT exhibit consistent entropy reduction.
Tasks not benefiting from CoT display unstable or increasing entropy patterns.
The behavior is interpreted as a phase-transition-like shift from high-entropy exploration to low-entropy structured reasoning.
The paper proposes EDRM (Entropy-Driven Reasoning Modulation) to adaptively apply CoT.
The study is published on arXiv with ID 2605.22873.

LLM Reasoning as Dynamic Decoding State via Entropy Phase Transitions

Key facts

Entities

Institutions

Sources