ARTFEED — Contemporary Art Intelligence

DiagramBank Dataset Enables AI-Generated Scientific Diagrams

ai-technology · 2026-04-25

Researchers have introduced DiagramBank, a large-scale dataset of 89,422 schematic diagrams sourced from top-tier scientific publications. Designed to address a bottleneck in autonomous "AI scientist" systems, the dataset enables multimodal retrieval and exemplar-driven generation of publication-grade scientific figures, such as teaser images. Unlike derivative data plots, these diagrams require conceptual synthesis to translate complex logic into compelling visuals. The dataset is intended to support retrieval-augmented generation for scientific figure creation, filling a gap where existing AI systems often omit or produce inferior alternatives. The work is detailed in arXiv preprint 2604.20857.

Key facts

  • DiagramBank contains 89,422 schematic diagrams from top-tier scientific publications.
  • The dataset is designed for multimodal retrieval and exemplar-driven scientific figure generation.
  • It addresses a bottleneck in autonomous AI scientist systems for producing publication-grade diagrams.
  • Teaser figures serve as strategic visual interfaces requiring conceptual synthesis.
  • Existing AI systems often omit or produce inferior alternatives to scientific diagrams.
  • The dataset supports retrieval-augmented generation for scientific figure creation.
  • The research is published on arXiv with ID 2604.20857.
  • The dataset targets schematic diagrams, not derivative data plots.

Entities

Institutions

  • arXiv

Sources