ARTFEED — Contemporary Art Intelligence

Ilov3Splat: Open-Vocabulary 3D Scene Understanding via Gaussian Splatting

ai-technology · 2026-05-07

Researchers introduced Ilov3Splat, a framework for instance-level open-vocabulary 3D scene understanding using 3D Gaussian Splatting (3D-GS). Unlike prior methods relying on 2D rendering or point-level semantic association, Ilov3Splat jointly optimizes geometry and semantics by augmenting Gaussian splats with view-consistent feature fields. It uses multi-resolution hash embedding to encode CLIP features for dense language grounding in 3D space, and trains an instance feature field with contrastive loss over SAM masks for fine-grained object distinction. At inference, CLIP queries are matched against learned features with two-stage 3D clustering. The paper is available on arXiv.

Key facts

  • Ilov3Splat is a framework for instance-level open-vocabulary 3D scene understanding.
  • It is built on 3D Gaussian Splatting (3D-GS).
  • Prior work depends on 2D rendering-based matching or point-level semantic association.
  • The method jointly optimizes scene geometry and semantic representations.
  • It augments Gaussian splats with view-consistent feature fields.
  • Multi-resolution hash embedding encodes language-aligned CLIP features.
  • Instance feature field is trained using contrastive loss over SAM masks.
  • Inference uses CLIP-encoded queries matched against learned features with two-stage 3D clustering.

Entities

Institutions

  • arXiv

Sources