PDI-Bench: A Quantitative Framework for Geometric Consistency in Video Generation
Researchers have introduced a new tool called PDI-Bench, or Perspective Distortion Index, to evaluate how well AI-generated videos maintain geometric consistency. This framework addresses the challenge of assessing the realism of 3D shapes and movements in generative video models, as existing methods often rely on personal judgment. PDI-Bench uses object-focused data from segmentation and tracking tools (like SAM 2, MegaSaM, and CoTracker3) to convert this information into 3D coordinates through monocular reconstruction. It measures projective-geometry errors across three key areas: depth-scale alignment, consistency of 3D motion, and the rigidity of 3D structures. To support thorough evaluations, the researchers compiled the PDI-Dataset, which features diverse scenarios for testing. This study is detailed in a paper on arXiv (2605.15185).
Key facts
- PDI-Bench is a quantitative framework for auditing geometric coherence in generated videos.
- It uses SAM 2, MegaSaM, and CoTracker3 for segmentation and point tracking.
- Three failure dimensions are measured: scale-depth alignment, 3D motion consistency, and 3D structural rigidity.
- PDI-Dataset covers diverse scenarios to stress geometric consistency.
- The paper is available on arXiv with ID 2605.15185.
- Existing video evaluation pipelines rely on human judgment or learned graders.
- The framework lifts observations to 3D world-space coordinates via monocular reconstruction.
- Generative video models are studied as implicit world models.
Entities
Institutions
- arXiv