IVGT: Implicit Visual Geometry Transformer for 3D Scene Reconstruction

ai-technology · 2026-05-18

A novel AI framework named IVGT (Implicit Visual Geometry Transformer) has been introduced for the reconstruction of 3D geometry and appearance from unposed multi-view images. Unlike current visual geometry foundation models that generate explicit geometry through pixel-aligned pointmaps—which face issues of redundancy and restricted geometric continuity—IVGT offers an implicit approach to model continuous and coherent geometry. It develops a continuous neural scene representation within a canonical coordinate system, allowing spatial queries at any 3D location to access local features. These features facilitate the prediction of signed distance function (SDF) values and colors using lightweight decoders. The model enables the direct extraction of continuous surface geometry and can produce RGB images, depth maps, and surface normal maps from any viewpoint. IVGT is trained through multi-dataset joint optimization. The paper can be found on arXiv with the identifier 2605.16258.

Key facts

IVGT stands for Implicit Visual Geometry Transformer.
It reconstructs 3D geometry and appearance from unposed multi-view images.
Existing models use explicit geometry via pixel-aligned pointmaps, which are redundant and lack geometric continuity.
IVGT uses an implicit formulation to model continuous and coherent geometry.
It learns a neural scene representation in a canonical coordinate system.
The model supports continuous spatial queries at any 3D position.
It predicts signed distance (SDF) values and colors using lightweight decoders.
IVGT can render RGB images, depth maps, and surface normal maps from arbitrary viewpoints.
Training involves multi-dataset joint optimization.
The paper is published on arXiv with ID 2605.16258.

IVGT: Implicit Visual Geometry Transformer for 3D Scene Reconstruction

Key facts

Entities

Institutions

Sources