SAVEMem: Semantic-Aware Memory for Streaming Video
A novel framework, SAVEMem (Semantic-Aware Adaptive Visual Memory), has been introduced to tackle the issue of memory management in the realm of online streaming video comprehension. This system efficiently processes ongoing visual data and provides real-time responses to user inquiries. In contrast to current approaches that rely on visual similarity heuristics for token compression or enhance compression through KV-cache-level retrieval, SAVEMem utilizes semantic signals for memory creation and adjusts the retrieval scope based on each query. In its first stage, it establishes a three-tier streaming memory within a fixed memory budget, utilizing a pseudo-question bank to offer a lightweight semantic foundation, allowing for long-term retention influenced by semantic importance rather than visual similarity. This framework is capable of managing limitless streams and unpredictable query timings without the need for training. The research is available on arXiv under ID 2605.07897.
Key facts
- SAVEMem is a training-free dual-stage framework for streaming video understanding.
- It brings semantic awareness into memory generation.
- Retrieval scope adapts per query.
- Stage 1 builds a three-tier streaming memory under constant budget.
- A fixed pseudo-question bank provides semantic prior.
- Long-term retention is shaped by semantic salience.
- Existing methods use visual similarity heuristics or KV-cache-level retrieval.
- Paper ID: arXiv:2605.07897.
Entities
Institutions
- arXiv