CLORE: A Framework for Efficient LLM Reasoning via Content-Level Optimization
Researchers propose CLORE, a content-level optimization framework to improve reasoning efficiency in large language models. Reinforcement learning post-training often produces long, repetitive, or opaque reasoning traces. CLORE edits correct on-policy rollouts by deleting repetitive, illegible, or task-irrelevant content while preserving the final answer. It uses an external augmentation model and optimizes augmented-original pairs with a reference-free DPO objective alongside standard policy-gradient training. The method restricts augmentation to correct trajectories and performs local deletion, keeping edited outputs concise. The paper is available on arXiv under ID 2605.22211.
Key facts
- CLORE stands for Content-Level Optimization for Reasoning Efficiency
- arXiv ID: 2605.22211
- Announce type: new
- Addresses unnecessarily long, repetitive, or semantically opaque reasoning traces from RL post-training
- Uses an external augmentation model to delete repetitive segments, illegible or task-irrelevant content, and superfluous reasoning
- Preserves the final answer
- Optimizes augmented-original pairs with an auxiliary reference-free DPO objective
- Restricts augmentation to correct trajectories and performs local deletion
Entities
Institutions
- arXiv