ARTFEED — Contemporary Art Intelligence

Adaptive Mass-Segmented KV Compression for Long-Context Reasoning

ai-technology · 2026-05-25

arXiv paper 2605.23200 introduces Adaptive Mass-Segmented (AMS) KV Compression, a method to address the linear growth of the Key-Value (KV) cache in long-form LLM inference. The authors identify that existing compression methods relying on global Top-k selection cause Region Wipe-out, where contiguous reasoning blocks are severely evicted, disrupting logical coherence. AMS shifts from token-level competition to region-aware quota allocation by adaptively partitioning the KV cache based on attention mass distribution, ensuring vital reasoning segments receive guaranteed memory. An EMA-based smoothing mechanism prevents jitter in segment boundaries during iterative decoding. AMS is a universal plug-and-play layer orthogonal to existing scorers.

Key facts

  • arXiv paper 2605.23200 proposes Adaptive Mass-Segmented (AMS) KV Compression
  • Addresses linear growth of KV cache in long-form LLM inference
  • Existing Top-k selection causes Region Wipe-out of contiguous reasoning blocks
  • AMS shifts from token-level competition to region-aware quota allocation
  • Partitions KV cache based on spatial distribution of attention mass
  • EMA-based smoothing mechanism prevents jitter in segment boundaries
  • AMS is a universal plug-and-play layer orthogonal to existing scorers

Entities

Institutions

  • arXiv

Sources