ProxyCoT: Boosting Long-Context Reasoning in LLMs via Short Proxy Contexts

ai-technology · 2026-05-22

A novel training approach known as ProxyCoT enhances long-context reasoning in large language models (LLMs) by enabling the transfer of reasoning skills from brief proxy contexts to comprehensive long contexts. While existing LLMs can manage up to 10 million tokens, they struggle with intricate reasoning tasks involving lengthy sequences. ProxyCoT initially creates high-quality chain-of-thought reasoning traces within short proxy contexts, utilizing reinforcement learning or distillation from a larger teacher model. These traces are subsequently integrated into full long contexts via supervised fine-tuning. Experiments indicate that ProxyCoT consistently surpasses robust baselines while minimizing computational expenses. This method effectively bridges the performance divide between proxy and full contexts, which rely on the same foundational reasoning process. The research can be found on arXiv under ID 2605.20201.

Key facts

ProxyCoT transfers reasoning from short proxy contexts to full long contexts.
LLMs currently support up to 10 million tokens but struggle with long-context reasoning.
ProxyCoT uses reinforcement learning or distillation from a larger teacher model.
The framework applies supervised fine-tuning to ground traces in full contexts.
ProxyCoT outperforms strong baselines with reduced computational cost.
The method addresses the performance disparity between proxy and full contexts.
The paper is on arXiv with ID 2605.20201.
Proxy contexts are subsets of the input that suffice for solving tasks.

ProxyCoT: Boosting Long-Context Reasoning in LLMs via Short Proxy Contexts

Key facts

Entities

Institutions

Sources