Fixed-Contract Diagnostic for KV Cache Compression in LLMs

other · 2026-05-12

Researchers have unveiled a novel diagnostic technique aimed at tackling the challenges faced by value-aware key-value cache eviction in extended-context large language model inferences. Detailed in a recent arXiv publication, this method uncovers three primary failure types: insufficient evidence, irrelevant high scores, and faulty related evidence. By stabilizing the selector while iteratively modifying one decision slot, the approach merges attention strength with variations in predicted outcomes. Tests conducted on LongBench across two budgets and three models demonstrated a successful identification of 72.6% of positive-margin cells. Additionally, the NeedleBench M-RT at 32k robustly supports closure during branched retrieval processes.

Key facts

arXiv:2605.08234 introduces a fixed-contract diagnostic for KV cache compression.
The diagnostic identifies three failure modes: missing evidence, irrelevant high scores, and broken related evidence.
The probe combines a block's attention mass with estimated output change from removing it.
On LongBench across three models and two budgets, the probe is positive on 72.6% of positive-margin cells.
The probe is positive on 32.4% of nonpositive-margin cells.
NeedleBench M-RT at 32k supports closure under branched retrieval.
A RULER 8k check probe supports closure under branched retrieval.
The method holds the selector's setup fixed and changes one decision slot at a time.

Fixed-Contract Diagnostic for KV Cache Compression in LLMs

Key facts

Entities

Institutions

Sources