LessIsMore: Training-Free Sparse Attention for Efficient Reasoning

ai-technology · 2026-04-30

The innovative approach known as LessIsMore presents a training-free sparse attention mechanism tailored for extensive reasoning models, tackling the computational demands associated with lengthy decoding sequences. The primary realization is that the significance of tokens during reasoning remains global and consistent, with essential tokens being common among attention heads and stable throughout decoding phases. LessIsMore implements unified token selection across heads and retains recent context through a consistent recency window, resulting in a globally coherent token set that can be reused across layers. This strategy minimizes latency and memory consumption without the need for expensive retraining, ensuring reasoning precision across various model families and demanding benchmarks.

Key facts

LessIsMore is a training-free sparse attention mechanism.
It targets long-horizon reasoning in large models.
Token importance is global and stable across heads and steps.
Cross-head unified token selection is enforced.
A stable recency window preserves recent context.
The token set is consistent and reusable across layers.
It reduces latency and memory usage.
No retraining is required.

Entities

—

Sources

arXiv cs.AI — 2026-04-29