Agentic AI Architecture Automates Scientific Workflow Creation from Natural Language
A new agentic architecture has been created to link research questions with effective scientific workflows by automating semantic translation. It consists of three layers: the first is a semantic layer powered by an LLM that converts natural language into structured intents. The second layer is deterministic, using validated generators to build reproducible workflow DAGs. Lastly, the knowledge layer allows domain experts to develop 'Skills'—markdown files that outline vocabulary mappings, parameter limits, and optimization methods. This setup minimizes LLM unpredictability to just intent extraction, ensuring that the same intents yield consistent workflows. The system was evaluated using the 1000 Genomes population genetics workflow and Hyperflow WMS on Kubernetes. You can find the paper on arXiv (2604.21910).
Key facts
- The architecture automates the semantic translation from research questions to workflow specifications.
- It consists of three layers: semantic, deterministic, and knowledge.
- The semantic layer uses an LLM to interpret natural language into structured intents.
- The deterministic layer uses validated generators to produce reproducible workflow DAGs.
- The knowledge layer contains 'Skills' authored by domain experts.
- LLM non-determinism is confined to intent extraction; identical intents produce identical workflows.
- The system was evaluated on the 1000 Genomes population genetics workflow.
- It was also tested with Hyperflow WMS running on Kubernetes.
Entities
Institutions
- arXiv
- 1000 Genomes
- Hyperflow WMS
- Kubernetes