Nearly Optimal Attention Coresets Achieved
A new computer science paper proves the existence of nearly optimal coresets for estimating the Attention mechanism in small space. The result shows that for any set of unit-norm keys and values in ℝ^d, there exists a subset of size at most O(√d e^{ρ+o(ρ)}/ε) that approximates the attention output for all queries with norm bounded by ρ, outperforming prior work. An improved lower bound of Ω(√d e^ρ/ε) is also provided.
Key facts
- Paper titled 'Nearly Optimal Attention Coresets'
- Proves existence of coresets for Attention mechanism
- Coreset size: O(√d e^{ρ+o(ρ)}/ε)
- Works for unit-norm keys and values in ℝ^d
- Approximation error ≤ ε for all queries with norm ≤ ρ
- Outperforms best known results
- Improved lower bound: Ω(√d e^ρ/ε)
- Submitted to arXiv (2605.05602)
Entities
Institutions
- arXiv