Overcomplete Reasoning Traces: 46% of Steps Removable in LLM Chain-of-Thought

other · 2026-05-16

A recent study published on arXiv (2605.14358) examines overcomplete reasoning traces within language models, identifying the minimal core as the smallest group of steps that maintains the final answer or predictive distribution. Researchers analyzed six deliberative reasoning benchmarks—arithmetic, competition math, expert scientific reasoning, and commonsense multi-hop QA—and discovered significant overcompleteness: on average, 46% of steps can be eliminated through greedy minimal-core extraction while still preserving the original answer in 86% of instances. The research highlights that predictive support is heavily concentrated, with the top three steps contributing the majority of the weight. Additionally, the study presents new metrics for assessing compression ratio, redundancy mass, step necessity, and necessity concentration.

Key facts

arXiv paper 2605.14358 studies overcomplete reasoning traces in language models.
Defines minimal core as smallest subset of steps preserving final answer or predictive distribution.
Introduces metrics: compression ratio, redundancy mass, step necessity, necessity concentration.
Evaluated on six deliberative reasoning benchmarks.
46% of steps are removable on average under greedy minimal-core extraction.
Original answer preserved in 86% of cases after removal.
Predictive support is concentrated in top three steps.
Benchmarks include arithmetic, competition math, expert scientific reasoning, commonsense multi-hop QA.

Overcomplete Reasoning Traces: 46% of Steps Removable in LLM Chain-of-Thought

Key facts

Entities

Institutions

Sources