ArchSIBench: New Benchmark Tests Architectural Spatial Intelligence in VLMs

ai-technology · 2026-05-22

A new benchmark called ArchSIBench has been developed by researchers to assess the architectural spatial intelligence of Vision-Language Models (VLMs). Unlike current benchmarks that primarily address basic spatial abilities such as counting objects and understanding relative orientation, ArchSIBench focuses on advanced cognitive aspects of architectural spaces, including layout comprehension, circulation patterns, and functional zoning. This benchmark integrates insights from architecture, cognitive science, and psychology, encompassing five essential dimensions: perception, reasoning, navigation, transformation, and configuration, with a total of 17 detailed subtasks. Published on arXiv (ID: 2605.20837), this work includes meticulous manual annotation and aims to enhance robot navigation, embodied interaction, and the understanding and generation of 3D scenes.

Key facts

ArchSIBench evaluates architectural spatial intelligence in VLMs.
It covers five core dimensions: perception, reasoning, navigation, transformation, and configuration.
The benchmark includes 17 fine-grained subtasks.
It is based on architecture, cognitive science, and psychology perspectives.
Published on arXiv with ID 2605.20837.
Focuses on higher-level spatial cognition beyond basic skills.
Aims to improve robot navigation and 3D scene understanding.
Involves careful manual annotation.

ArchSIBench: New Benchmark Tests Architectural Spatial Intelligence in VLMs

Key facts

Entities

Institutions

Sources