ARTFEED — Contemporary Art Intelligence

TokenButler Predicts Critical Tokens in LLM KV-Cache

ai-technology · 2026-05-18

Researchers have introduced TokenButler, a query-aware predictor that identifies critical tokens in the Key-Value (KV) Cache of Large Language Models (LLMs). The KV-Cache stores token history for efficient decoding, but grows into a memory and computation bottleneck. Prior work shows only a small subset of tokens are meaningful per decoding step, but these tokens are dynamic and input-dependent. Existing methods either permanently evict tokens, risking quality, or retain the full cache with retrieval-based sparsity using inaccurate proxies. TokenButler learns to predict low-dimensional importance queries at a fixed depth stride, enabling high-granularity, query-aware token selection. The paper is available on arXiv under ID 2503.07518.

Key facts

  • TokenButler is a query-aware predictor for critical tokens in LLM KV-Cache.
  • KV-Cache stores token history for efficient decoding but becomes a bottleneck.
  • Only a small subset of tokens contribute meaningfully to each decoding step.
  • Critical tokens are dynamic and heavily input query-dependent.
  • Existing methods either evict tokens permanently or use inaccurate proxies.
  • TokenButler predicts low-dimensional importance queries at a fixed depth stride.
  • The paper is on arXiv with ID 2503.07518.
  • TokenButler offers high-granularity, query-aware token selection.

Entities

Institutions

  • arXiv

Sources