Nearly Optimal Attention Coresets Achieved

ai-technology · 2026-05-09

A new computer science paper proves the existence of nearly optimal coresets for estimating the Attention mechanism in small space. The result shows that for any set of unit-norm keys and values in ℝ^d, there exists a subset of size at most O(√d e^{ρ+o(ρ)}/ε) that approximates the attention output for all queries with norm bounded by ρ, outperforming prior work. An improved lower bound of Ω(√d e^ρ/ε) is also provided.

Key facts

Paper titled 'Nearly Optimal Attention Coresets'
Proves existence of coresets for Attention mechanism
Coreset size: O(√d e^{ρ+o(ρ)}/ε)
Works for unit-norm keys and values in ℝ^d
Approximation error ≤ ε for all queries with norm ≤ ρ
Outperforms best known results
Improved lower bound: Ω(√d e^ρ/ε)
Submitted to arXiv (2605.05602)

Nearly Optimal Attention Coresets Achieved

Key facts

Entities

Institutions

Sources