MuKV: Multi-Grained KV Cache Compression for Long Streaming Video QA

other · 2026-05-23

MuKV is a technique designed to optimize key-value caches in large language models, facilitating effective question-answering for lengthy streaming videos. It employs multi-grained visual representations across patch, frame, and segment levels, maintaining both local details and overarching temporal context. A dual signal token compression method, influenced by self-attention and frequency, minimizes memory consumption. Additionally, the strategy incorporates a semi-hierarchical retrieval system for managing both offline and online KV caches. MuKV effectively tackles the issues of increasing visual tokens and restricted reasoning length in LLMs for streaming video question-answering.

Key facts

MuKV is proposed for long streaming video QA.
It features multi-grained KV cache compression.
Visual representations are extracted at patch, frame, and segment levels.
A dual signal token compression mechanism uses self-attention and frequency.
The method includes a semi-hierarchical retrieval approach.
It targets both offline and online KV caches.
MuKV aims to improve efficiency and accuracy.
The paper is on arXiv with ID 2605.22269.

Entities

—

Sources

arXiv cs.AI — 2026-05-23