New Benchmark Tests AI Agents on Long-Horizon Spatial Biology

other · 2026-05-28

A new evaluation tool named SpatialBench-Long has been unveiled to assess artificial intelligence agents' capabilities in scientific reasoning based on spatial biological information. This innovative benchmark stands apart from others by challenging agents to derive biological conclusions from raw spatial data and tailored experimental contexts rather than relying on conventional approaches. It encompasses 24 assessments across various biological models, including pancreatic ductal adenocarcinoma, engineered glioblastoma organoids, and others. The tool integrates multiple data types, such as single-cell RNA sequencing and histology, and strengthens candidate findings through replication methods.

Key facts

SpatialBench-Long evaluates AI agents on long-horizon spatial biology.
Benchmark requires agents to recover biological claims from raw data.
Contains 24 evaluations across multiple biological systems.
Systems include PDAC, glioblastoma organoids, lung adenocarcinoma, and mouse optic nerve.
Data types include CosMx, Visium, Xenium, MERFISH, scRNA-seq, Slide-seq, Slide-tags, histology, and lineage-recording.
No prescribed methods are given to agents.
Benchmark tests end-to-end scientific reasoning.
Published on arXiv with ID 2605.28065.

New Benchmark Tests AI Agents on Long-Horizon Spatial Biology

Key facts

Entities

Institutions

Sources