SPM-Bench: New Benchmark Tests LLMs on Scanning Probe Microscopy

ai-technology · 2026-06-01

A new benchmark called SPM-Bench has been launched by researchers to assess large language models (LLMs) in the context of scanning probe microscopy (SPM) at a PhD level. This benchmark aims to overcome limitations found in current scientific benchmarks, such as data contamination and a lack of complexity. To facilitate this, a fully automated data synthesis pipeline has been created, utilizing Anchor-Gated Sieve (AGS) technology to gather valuable image-text pairs from arXiv and journal articles published between 2023 and 2025. This pipeline features a hybrid cloud-local architecture, enabling vision-language models (VLMs) to provide spatial coordinates for local cropping, thus ensuring token efficiency and dataset integrity. A Strict Imperfection metric is included for precise evaluation of LLM performance.

Key facts

SPM-Bench is a PhD-level multimodal benchmark for scanning probe microscopy.
It uses an automated data synthesis pipeline with Anchor-Gated Sieve (AGS) technology.
Data is sourced from arXiv and journal papers published between 2023 and 2025.
The pipeline employs a hybrid cloud-local architecture for token savings.
VLMs return spatial coordinates for local high-fidelity cropping.
A Strict Imperfection metric is introduced for evaluation.
The benchmark aims to address data contamination and insufficient complexity in existing benchmarks.
The work is published on arXiv with identifier 2602.22971.

SPM-Bench: New Benchmark Tests LLMs on Scanning Probe Microscopy

Key facts

Entities

Institutions

Sources