ARTFEED — Contemporary Art Intelligence

BioMiner: Multi-modal AI System for Automated Drug Discovery Data Extraction

ai-technology · 2026-04-25

Researchers have developed BioMiner, a multi-modal extraction framework that automates the mining of protein-ligand bioactivity data from scientific literature. The system addresses the bottleneck of manual curation by separating bioactivity semantic interpretation from ligand structure construction. It uses multi-modal large language models operating on chemically grounded visual representations to infer inter-structure relationships and reconstruct exact molecular structures, including complex Markush structures. The framework is designed to handle data distributed across text, tables, and figures, which is essential for drug discovery. The paper is available on arXiv under ID 2604.21508.

Key facts

  • BioMiner is a multi-modal extraction framework for protein-ligand bioactivity data.
  • It separates bioactivity semantic interpretation from ligand structure construction.
  • The system uses multi-modal large language models on chemically grounded visual representations.
  • It can reconstruct chemically exact ligand structures, including Markush structures.
  • The paper is published on arXiv with ID 2604.21508.
  • Manual curation struggles to keep pace with rapidly growing literature.
  • Automated bioactivity extraction requires interpreting biochemical semantics across text, tables, and figures.
  • BioMiner infers bioactivity semantics through direct reasoning.

Entities

Institutions

  • arXiv

Sources