StructSense: AI Framework for Structured Info Extraction from Scientific Literature

ai-technology · 2026-05-23

A new open-source framework named StructSense has been developed by researchers to extract structured data from scientific literature in a modular and task-agnostic manner. This framework enhances domain-specific extraction by incorporating ontology-driven symbolic knowledge, self-evaluative refinement, and human validation. StructSense was tested on three tasks with varying semantic complexities, achieving 91–100% accuracy in schema-based extraction of assessment tools, 86–93% in overall metadata and resource extraction from scientific articles, and 58–75% accuracy in named entity recognition (NER) from neuroscience texts involving 8,882 entities. In two biomedical NER benchmarks, NCBI Disease and S800 Species, it recorded ≥90% relaxed recall and 62.5% exact match. This research is available on arXiv, reference 2507.03674.

Key facts

StructSense is a modular, task-agnostic, open-source framework.
It integrates ontology-guided symbolic knowledge, agentic self-evaluative refinement, and human-in-the-loop validation.
Achieved 91–100% accuracy on schema-based extraction of assessment instruments.
Achieved 86–93% overall on metadata and resource extraction from scientific papers.
Achieved 58–75% label accuracy on NER from neuroscience literature across 8,882 entities.
On NCBI Disease and S800 Species benchmarks, achieved ≥90% relaxed recall and 62.5% exact match.
Published on arXiv under reference 2507.03674.
Addresses LLM limitations in specialized domains.

StructSense: AI Framework for Structured Info Extraction from Scientific Literature

Key facts

Entities

Institutions

Sources