New AI Architecture MM-Mem Uses Semantic Information Bottleneck for Long-Horizon Video Understanding

ai-technology · 2026-04-22

A new pyramidal multimodal memory architecture, named MM-Mem, has been developed by researchers to overcome challenges in long-horizon video comprehension using multimodal large language models. This innovative system organizes memory into three hierarchical components: Sensory Buffer, Episodic Stream, and Symbolic Schema. This structure facilitates the transformation of detailed perceptual information into overarching semantic schemas, progressing from precise details to a general understanding. Grounded in Fuzzy-Trace Theory, the architecture incorporates a Semantic Information Bottleneck that dynamically shapes memory. While current multimodal models excel in short-term reasoning, they falter in long-term video analysis due to limited context windows and inflexible memory systems. Existing strategies either rely too heavily on visual data, causing latency and redundancy, or focus on text, leading to detail loss and hallucinations. MM-Mem seeks to reconcile these issues. This research was published on arXiv with the identifier arXiv:2603.01455v3, categorized as replace-cross.

Key facts

MM-Mem is a pyramidal multimodal memory architecture for long-horizon video understanding
The architecture structures memory hierarchically into Sensory Buffer, Episodic Stream, and Symbolic Schema
It enables progressive distillation from fine-grained perceptual traces to high-level semantic schemas
The system is grounded in Fuzzy-Trace Theory
A Semantic Information Bottleneck governs dynamic memory construction
Multimodal large language models struggle with long-horizon video understanding due to limited context windows
Existing methods fall into vision-centric (high latency) or text-centric (detail loss) extremes
Research was published on arXiv under identifier arXiv:2603.01455v3

New AI Architecture MM-Mem Uses Semantic Information Bottleneck for Long-Horizon Video Understanding

Key facts

Entities

Institutions

Sources