SLAT: Segment-Level Adaptive Trimming for Efficient CoT Reasoning
A new reinforcement learning framework called SLAT (Segment-Level Adaptive Trimming) has been proposed to reduce computational overhead in chain-of-thought reasoning. Large reasoning models often generate redundant reasoning steps, a phenomenon known as overthinking, which increases costs without improving accuracy. Existing methods apply uniform length penalties that can suppress useful reasoning. SLAT identifies inefficiency in high-probability segments with low marginal utility and selectively suppresses them. The framework is based on a theoretical characterization of segment suboptimality under a correctness-length trade-off. Empirical results on standard benchmarks show that SLAT reduces reasoning length while maintaining or improving accuracy. The research is detailed in arXiv paper 2605.30832.
Key facts
- SLAT stands for Segment-Level Adaptive Trimming.
- It addresses overthinking in chain-of-thought reasoning.
- Overthinking refers to structural redundancy in reasoning chains.
- Existing methods use token-uniform length penalties.
- SLAT targets high-probability segments with low marginal utility.
- The framework is based on a correctness-length trade-off objective.
- Empirical results show reduced reasoning length without accuracy loss.
- The paper is available on arXiv under ID 2605.30832.
Entities
Institutions
- arXiv