New Benchmark Tests AI Agents on Long-Horizon Spatial Biology
A new evaluation tool named SpatialBench-Long has been unveiled to assess artificial intelligence agents' capabilities in scientific reasoning based on spatial biological information. This innovative benchmark stands apart from others by challenging agents to derive biological conclusions from raw spatial data and tailored experimental contexts rather than relying on conventional approaches. It encompasses 24 assessments across various biological models, including pancreatic ductal adenocarcinoma, engineered glioblastoma organoids, and others. The tool integrates multiple data types, such as single-cell RNA sequencing and histology, and strengthens candidate findings through replication methods.
Key facts
- SpatialBench-Long evaluates AI agents on long-horizon spatial biology.
- Benchmark requires agents to recover biological claims from raw data.
- Contains 24 evaluations across multiple biological systems.
- Systems include PDAC, glioblastoma organoids, lung adenocarcinoma, and mouse optic nerve.
- Data types include CosMx, Visium, Xenium, MERFISH, scRNA-seq, Slide-seq, Slide-tags, histology, and lineage-recording.
- No prescribed methods are given to agents.
- Benchmark tests end-to-end scientific reasoning.
- Published on arXiv with ID 2605.28065.
Entities
Institutions
- arXiv