PanoWorld Generates Geometry-Consistent 360° Video from Single Image

ai-technology · 2026-05-18

A new panoramic video world model called PanoWorld has been developed by researchers, enabling the creation of geometry-consistent 360° videos from just one image and a caption. Traditional methods for panoramic video focus on visual realism but often fail to maintain explicit constraints on the 3D scene, resulting in issues like inconsistent depth and unrealistic motion across the spherical surface. PanoWorld approaches the challenge of panoramic video generation by modeling latent states that ensure consistency in geometry and dynamics, rather than merely focusing on visual output. The system builds upon a pre-trained perspective video world model and employs two lightweight regularizers: a depth consistency loss and a trajectory consistency loss. This study is available on arXiv with the identifier 2605.15391.

Key facts

PanoWorld generates 360° video from a single image and caption.
It enforces geometry and dynamics consistency in latent state modeling.
Two regularizers: depth consistency loss and trajectory consistency loss.
Built on a pre-trained perspective video world model.
Spherical-geometry-aware adaptation applied to conditioning and positional encodings.
Addresses inconsistent depth and implausible motion in existing methods.
Published on arXiv with ID 2605.15391.
Announce type is cross.

PanoWorld Generates Geometry-Consistent 360° Video from Single Image

Key facts

Entities

Institutions

Sources