ARTFEED — Contemporary Art Intelligence

TouchSafeBench: Benchmarking Collision Grounding in VLMs for Human-Robot Safety

ai-technology · 2026-06-01

TouchSafeBench, a new evaluation standard, assesses vision-language models (VLMs) in terms of collision grounding—determining whether a robot is in a safe position, currently colliding, or on the verge of collision with a person or object. Developed using Habitat 3.0, this benchmark features 2,940 simulated indoor co-presence scenarios spanning social navigation and social rearrangement tasks. It offers synchronized multi-view RGB-D data, top-down trajectory maps, calibrated camera information, and contact labels derived from the simulator. The research emphasizes two key tasks for deployment: identifying the current safety state and issuing warnings about potential collisions. The findings underscore that effective human-robot collaboration requires more than just visual descriptions; it necessitates integrating visual data with robot geometry, camera perspective, scene arrangement, human proximity, and motion over time. The paper can be found on arXiv with ID 2605.31196.

Key facts

  • TouchSafeBench is a physics-grounded benchmark for collision grounding in VLMs.
  • Built in Habitat 3.0.
  • Contains 2,940 simulated indoor co-presence episodes.
  • Covers social navigation and social rearrangement tasks.
  • Provides synchronized multi-view RGB-D observations, top-down trajectory maps, calibrated camera metadata, and simulator-derived contact labels.
  • Two deployment-facing tasks: classifying current safety state and warning about imminent collision.
  • Collision grounding requires binding visual observations to robot body geometry, camera viewpoint, scene layout, human proximity, and temporal motion.
  • Paper ID: arXiv:2605.31196.

Entities

Institutions

  • arXiv

Sources