E3C: Egocentric Video Generation with 3D Memory and Pose Control
Researchers have introduced E3C, a controllable video diffusion framework for egocentric video generation. The system uses a semi-dense point cloud-based 3D memory augmented with appearance descriptors from video-VAE features, rendered into target viewpoints. It disentangles persistent scene structure from human-driven dynamics, addressing challenges like rapid viewpoint changes, self-occlusions, and subtle articulated actions. The framework is designed for embodied agents to reason about actions and scene changes. The paper is available on arXiv.
Key facts
- E3C is a controllable video diffusion framework for egocentric generation.
- It constructs a semi-dense point cloud-based 3D memory from context frames.
- Each point is augmented with appearance descriptors from video-VAE features.
- The memory is rendered into target viewpoints.
- It disentangles persistent scene structure from human-driven dynamics.
- The framework addresses rapid viewpoint changes and self-occlusions.
- It is designed for embodied agents to reason about actions and scene changes.
- The paper is on arXiv with ID 2605.26316.
Entities
Institutions
- arXiv