Thinking as Compression: LLMs Naturally Shorten Context Without Special Training

ai-technology · 2026-05-28

A new research paper from arXiv reveals that reasoning models can inherently compress long contexts by generating thinking traces, eliminating the need for dedicated compression modules. The study introduces Thinking as Compression (TaC), a paradigm where the model's own reasoning process serves as compressed context. TaC outperforms existing compression methods without specialized training. To address budget control and shortcut behaviors, the authors propose TaC-C, which uses a simple reward mechanism to constrain thinking output. The findings suggest that LLMs possess intrinsic compression capabilities that have been underexplored.

Key facts

Paper ID: arXiv:2605.28713v1
Context compression aims to shorten long inputs for LLM inference acceleration
Existing methods rely on complex compression modules or compression-specific training
TaC directly prompts the thinking model to generate thinking traces as shortened context
TaC outperforms most representative compression methods
TaC-C introduces a simple reward mechanism for budget control and to avoid shortcut behaviors
The research was published on arXiv
The paper is categorized as a new announcement type

Thinking as Compression: LLMs Naturally Shorten Context Without Special Training

Key facts

Entities

Institutions

Sources