Geometry Forcing Bridges Video Diffusion and 3D World Modeling
Researchers propose Geometry Forcing, a method to integrate 3D geometric awareness into video diffusion models. The approach aligns intermediate representations with features from a geometric foundation model using two objectives: Angular Alignment for directional consistency and Scale Alignment for scale preservation. This addresses the failure of standard video diffusion models to capture meaningful 3D structure from 2D video data. The paper is available on arXiv (2507.07982).
Key facts
- Geometry Forcing aligns video diffusion model representations with geometric foundation model features.
- Two alignment objectives: Angular Alignment (cosine similarity) and Scale Alignment (regression).
- Addresses the gap between 2D video diffusion and 3D world modeling.
- Paper available on arXiv with ID 2507.07982.
- Method encourages geometry-aware intermediate representations.
Entities
Institutions
- arXiv