E3C: Egocentric Video Generation with 3D Memory and Pose Control

ai-technology · 2026-05-27

Researchers have introduced E3C, a controllable video diffusion framework for egocentric video generation. The system uses a semi-dense point cloud-based 3D memory augmented with appearance descriptors from video-VAE features, rendered into target viewpoints. It disentangles persistent scene structure from human-driven dynamics, addressing challenges like rapid viewpoint changes, self-occlusions, and subtle articulated actions. The framework is designed for embodied agents to reason about actions and scene changes. The paper is available on arXiv.

Key facts

E3C is a controllable video diffusion framework for egocentric generation.
It constructs a semi-dense point cloud-based 3D memory from context frames.
Each point is augmented with appearance descriptors from video-VAE features.
The memory is rendered into target viewpoints.
It disentangles persistent scene structure from human-driven dynamics.
The framework addresses rapid viewpoint changes and self-occlusions.
It is designed for embodied agents to reason about actions and scene changes.
The paper is on arXiv with ID 2605.26316.

E3C: Egocentric Video Generation with 3D Memory and Pose Control

Key facts

Entities

Institutions

Sources