Meta-Soft: Dynamic KV Cache Compression for LLMs

ai-technology · 2026-05-23

Meta-Soft is a new framework for compressing the key-value (KV) cache in large language models (LLMs), addressing memory blow-up and reduced decoding efficiency during long-context processing. Unlike existing methods like Judge Q, which use fixed soft tokens and static parameters, Meta-Soft employs a dynamic compression approach based on probe-driven context integration. It builds a meta-library with a learnable orthogonal basis matrix and uses a selector network with Gumbel-Softmax to produce differentiable sparse combinations. This allows the model to adapt to different input prompts and precisely capture task relevance, preventing irreversible information loss from permanent eviction of KV pairs.

Key facts

Meta-Soft is a dynamic KV cache compression framework for LLMs.
It addresses memory blow-up and reduced decoding efficiency in long contexts.
Existing methods like Judge Q use fixed soft tokens and static parameters.
Meta-Soft uses a probe-driven context integration approach.
It builds a meta-library with a learnable orthogonal basis matrix.
A selector network with Gumbel-Softmax produces differentiable sparse combinations.
The framework adapts dynamically to different input prompts.
It prevents irreversible information loss from permanent KV pair eviction.

Meta-Soft: Dynamic KV Cache Compression for LLMs

Key facts

Entities

Institutions

Sources