ARTFEED — Contemporary Art Intelligence

CopT Reverses Chain-of-Thought for Efficient LLM Reasoning

ai-technology · 2026-05-20

A new paper introduces CopT (Contrastive On-Policy Thinking), a reasoning pipeline that reverses the traditional chain-of-thought (CoT) order. Instead of thinking before answering, CopT first generates a draft answer, then performs on-policy thinking conditioned on that draft for reflection and correction. It uses continuous embeddings as inference-time contrastive verifiers to assess draft answer trustworthiness. The approach aims to reduce token costs and avoid performative reasoning. The paper is published on arXiv under ID 2605.20075.

Key facts

  • CopT reverses the order of thinking and answering in LLM reasoning.
  • It first elicits a draft answer, then invokes on-policy thinking for reflection.
  • Continuous embeddings are used as contrastive verifiers at inference time.
  • The approach targets performative reasoning and unnecessary token costs.
  • The paper is available on arXiv with ID 2605.20075.

Entities

Institutions

  • arXiv

Sources