CLORE: A Framework for Efficient LLM Reasoning via Content-Level Optimization

ai-technology · 2026-05-23

Researchers propose CLORE, a content-level optimization framework to improve reasoning efficiency in large language models. Reinforcement learning post-training often produces long, repetitive, or opaque reasoning traces. CLORE edits correct on-policy rollouts by deleting repetitive, illegible, or task-irrelevant content while preserving the final answer. It uses an external augmentation model and optimizes augmented-original pairs with a reference-free DPO objective alongside standard policy-gradient training. The method restricts augmentation to correct trajectories and performs local deletion, keeping edited outputs concise. The paper is available on arXiv under ID 2605.22211.

Key facts

CLORE stands for Content-Level Optimization for Reasoning Efficiency
arXiv ID: 2605.22211
Announce type: new
Addresses unnecessarily long, repetitive, or semantically opaque reasoning traces from RL post-training
Uses an external augmentation model to delete repetitive segments, illegible or task-irrelevant content, and superfluous reasoning
Preserves the final answer
Optimizes augmented-original pairs with an auxiliary reference-free DPO objective
Restricts augmentation to correct trajectories and performs local deletion

CLORE: A Framework for Efficient LLM Reasoning via Content-Level Optimization

Key facts

Entities

Institutions

Sources