ARTFEED — Contemporary Art Intelligence

E3C: Egocentric Video Generation with 3D Memory and Pose Control

ai-technology · 2026-05-27

Researchers have introduced E3C, a controllable video diffusion framework for egocentric video generation. The system uses a semi-dense point cloud-based 3D memory augmented with appearance descriptors from video-VAE features, rendered into target viewpoints. It disentangles persistent scene structure from human-driven dynamics, addressing challenges like rapid viewpoint changes, self-occlusions, and subtle articulated actions. The framework is designed for embodied agents to reason about actions and scene changes. The paper is available on arXiv.

Key facts

  • E3C is a controllable video diffusion framework for egocentric generation.
  • It constructs a semi-dense point cloud-based 3D memory from context frames.
  • Each point is augmented with appearance descriptors from video-VAE features.
  • The memory is rendered into target viewpoints.
  • It disentangles persistent scene structure from human-driven dynamics.
  • The framework addresses rapid viewpoint changes and self-occlusions.
  • It is designed for embodied agents to reason about actions and scene changes.
  • The paper is on arXiv with ID 2605.26316.

Entities

Institutions

  • arXiv

Sources