ARTFEED — Contemporary Art Intelligence

MuKV: Multi-Grained KV Cache Compression for Long Streaming Video QA

other · 2026-05-23

MuKV is a technique designed to optimize key-value caches in large language models, facilitating effective question-answering for lengthy streaming videos. It employs multi-grained visual representations across patch, frame, and segment levels, maintaining both local details and overarching temporal context. A dual signal token compression method, influenced by self-attention and frequency, minimizes memory consumption. Additionally, the strategy incorporates a semi-hierarchical retrieval system for managing both offline and online KV caches. MuKV effectively tackles the issues of increasing visual tokens and restricted reasoning length in LLMs for streaming video question-answering.

Key facts

  • MuKV is proposed for long streaming video QA.
  • It features multi-grained KV cache compression.
  • Visual representations are extracted at patch, frame, and segment levels.
  • A dual signal token compression mechanism uses self-attention and frequency.
  • The method includes a semi-hierarchical retrieval approach.
  • It targets both offline and online KV caches.
  • MuKV aims to improve efficiency and accuracy.
  • The paper is on arXiv with ID 2605.22269.

Entities

Sources