SPM-Bench: New Benchmark Tests LLMs on Scanning Probe Microscopy
A new benchmark called SPM-Bench has been launched by researchers to assess large language models (LLMs) in the context of scanning probe microscopy (SPM) at a PhD level. This benchmark aims to overcome limitations found in current scientific benchmarks, such as data contamination and a lack of complexity. To facilitate this, a fully automated data synthesis pipeline has been created, utilizing Anchor-Gated Sieve (AGS) technology to gather valuable image-text pairs from arXiv and journal articles published between 2023 and 2025. This pipeline features a hybrid cloud-local architecture, enabling vision-language models (VLMs) to provide spatial coordinates for local cropping, thus ensuring token efficiency and dataset integrity. A Strict Imperfection metric is included for precise evaluation of LLM performance.
Key facts
- SPM-Bench is a PhD-level multimodal benchmark for scanning probe microscopy.
- It uses an automated data synthesis pipeline with Anchor-Gated Sieve (AGS) technology.
- Data is sourced from arXiv and journal papers published between 2023 and 2025.
- The pipeline employs a hybrid cloud-local architecture for token savings.
- VLMs return spatial coordinates for local high-fidelity cropping.
- A Strict Imperfection metric is introduced for evaluation.
- The benchmark aims to address data contamination and insufficient complexity in existing benchmarks.
- The work is published on arXiv with identifier 2602.22971.
Entities
Institutions
- arXiv