MuKV: Multi-Grained KV Cache Compression for Long Streaming Video QA
MuKV is a technique designed to optimize key-value caches in large language models, facilitating effective question-answering for lengthy streaming videos. It employs multi-grained visual representations across patch, frame, and segment levels, maintaining both local details and overarching temporal context. A dual signal token compression method, influenced by self-attention and frequency, minimizes memory consumption. Additionally, the strategy incorporates a semi-hierarchical retrieval system for managing both offline and online KV caches. MuKV effectively tackles the issues of increasing visual tokens and restricted reasoning length in LLMs for streaming video question-answering.
Key facts
- MuKV is proposed for long streaming video QA.
- It features multi-grained KV cache compression.
- Visual representations are extracted at patch, frame, and segment levels.
- A dual signal token compression mechanism uses self-attention and frequency.
- The method includes a semi-hierarchical retrieval approach.
- It targets both offline and online KV caches.
- MuKV aims to improve efficiency and accuracy.
- The paper is on arXiv with ID 2605.22269.
Entities
—