CopT Reverses Chain-of-Thought for Efficient LLM Reasoning
A new paper introduces CopT (Contrastive On-Policy Thinking), a reasoning pipeline that reverses the traditional chain-of-thought (CoT) order. Instead of thinking before answering, CopT first generates a draft answer, then performs on-policy thinking conditioned on that draft for reflection and correction. It uses continuous embeddings as inference-time contrastive verifiers to assess draft answer trustworthiness. The approach aims to reduce token costs and avoid performative reasoning. The paper is published on arXiv under ID 2605.20075.
Key facts
- CopT reverses the order of thinking and answering in LLM reasoning.
- It first elicits a draft answer, then invokes on-policy thinking for reflection.
- Continuous embeddings are used as contrastive verifiers at inference time.
- The approach targets performative reasoning and unnecessary token costs.
- The paper is available on arXiv with ID 2605.20075.
Entities
Institutions
- arXiv