Ilov3Splat: Open-Vocabulary 3D Scene Understanding via Gaussian Splatting

ai-technology · 2026-05-07

Researchers introduced Ilov3Splat, a framework for instance-level open-vocabulary 3D scene understanding using 3D Gaussian Splatting (3D-GS). Unlike prior methods relying on 2D rendering or point-level semantic association, Ilov3Splat jointly optimizes geometry and semantics by augmenting Gaussian splats with view-consistent feature fields. It uses multi-resolution hash embedding to encode CLIP features for dense language grounding in 3D space, and trains an instance feature field with contrastive loss over SAM masks for fine-grained object distinction. At inference, CLIP queries are matched against learned features with two-stage 3D clustering. The paper is available on arXiv.

Key facts

Ilov3Splat is a framework for instance-level open-vocabulary 3D scene understanding.
It is built on 3D Gaussian Splatting (3D-GS).
Prior work depends on 2D rendering-based matching or point-level semantic association.
The method jointly optimizes scene geometry and semantic representations.
It augments Gaussian splats with view-consistent feature fields.
Multi-resolution hash embedding encodes language-aligned CLIP features.
Instance feature field is trained using contrastive loss over SAM masks.
Inference uses CLIP-encoded queries matched against learned features with two-stage 3D clustering.

Ilov3Splat: Open-Vocabulary 3D Scene Understanding via Gaussian Splatting

Key facts

Entities

Institutions

Sources