Compressed Reasoning Data in LLM Post-Training: A Taxonomy and Empirical Study
A new paper on arXiv (2605.28008) investigates the use of compressed reasoning data in supervised fine-tuning (SFT) for large language models (LLMs). The authors propose a taxonomy of chain-of-thought (CoT) reasoning: Explicit CoT (outputs all operations), Composed CoT (combines multiple operations into one step), and Implicit CoT (omits intermediate operations). They construct a synthetic compositional reasoning task to control difficulty, compression granularity, and data size, conducting experiments across different model families and sizes. Key finding: coarser CoT requires more data to match performance of finer CoT, but can reduce token cost. The study aims to understand when and how compressed reasoning data works in post-training.
Key facts
- Paper on arXiv: 2605.28008
- Taxonomy of CoT: Explicit, Composed, Implicit
- Synthetic compositional reasoning task used
- Experiments across multiple model families and sizes
- Coarser CoT requires more data to match finer CoT performance
- Compressed reasoning can reduce token cost
- Focus on supervised fine-tuning (SFT)
- Controlled variation of difficulty, compression granularity, data size
Entities
Institutions
- arXiv