ARTFEED — Contemporary Art Intelligence

MolRecBench-Wild: New Benchmark for Chemical Structure Recognition

other · 2026-05-09

MolRecBench-Wild has been launched by researchers as a benchmark comprising 5,029 molecular structures sourced from 820 contemporary chemistry publications. This benchmark aims to assess Optical Chemical Structure Recognition (OCSR) systems using authentic images. It utilizes MOSAIC, a framework that incorporates dual-dimensional difficulty levels and features 37 detailed labels addressing visual interference and chemical semantics. To facilitate accurate evaluations, the team has also introduced CARBON, a representation language adept at conveying valence changes, icon-based categories, and various unconventional chemical semantics. Furthermore, a dual-track evaluation protocol is established to accommodate both CARBON and SMILES outputs, ensuring extensive compatibility.

Key facts

  • MolRecBench-Wild contains 5,029 structures from 820 recent chemistry papers.
  • MOSAIC is a dual-dimensional difficulty framework with 37 fine-grained labels.
  • CARBON is a new representation language for non-standard chemical semantics.
  • The benchmark covers the full difficulty spectrum of real publications.
  • A dual-track evaluation protocol supports both CARBON and SMILES outputs.
  • OCSR aims to translate molecular diagrams into machine-readable formats.
  • Current OCSR systems remain unreliable on real-world images.
  • The work is published on arXiv with ID 2605.05832.

Entities

Institutions

  • arXiv

Sources