MDIA Multi-Agent Pipeline Outperforms ChatGPT for Clinicians on HealthBench

ai-technology · 2026-05-26

A team of researchers has unveiled MDIA (Multi-agent Diagnostic Intelligence Agent), a clinical reasoning graph with 7 nodes that is specialty-routed. This innovative system achieved a score of 0.6272 on the comprehensive HealthBench Professional benchmark (n=525) utilizing OpenAI's GPT-5.4-2026-03-05, surpassing ChatGPT for Clinicians by 3.72 percentage points. The improvements in performance are attributed to the design of the architecture and engine rather than prompt engineering. Key features include specialty routing, preservation of multi-turn context, drug-state safety gating, site-filtered searches, length-aware synthesis, and enhanced engine reliability. The findings, published on arXiv, emphasize that the performance of agentic clinical benchmarks relies on both the foundational model and orchestration architecture.

Key facts

MDIA is a Multi-agent Diagnostic Intelligence Agent implemented as a 7-node specialty-routed clinical reasoning graph.
Tested on the full HealthBench Professional benchmark (n=525) using a non-fine-tuned LLM.
Achieved 0.6272 under OpenAI's GPT-5.4-2026-03-05.
Outperformed OpenAI's ChatGPT for Clinicians by 3.72 percentage points.
Performance lift attributed to system architecture, not prompt engineering.
Key architectural features: specialty routing, multi-turn context preservation, drug-state safety gating, site-filtered search, length-aware synthesis, engine-level reliability.
Findings published on arXiv with ID 2605.24699.
Study supports view that agentic clinical benchmark performance is shaped by both foundation model and orchestration architecture.

MDIA Multi-Agent Pipeline Outperforms ChatGPT for Clinicians on HealthBench

Key facts

Entities

Institutions

Sources