EpiKG Boosts Clinical Retrieval by 22 Points in Assertion-Aware QA

publication · 2026-05-13

A recent research initiative has unveiled ClinicalBench, a new evaluation tool featuring 400 questions that assess assertion-aware retrieval, drawing from real electronic health records of 43 patients in the MIMIC-IV database. The benchmark targets nine specific areas, addressing issues like negation and the differing perspectives of patients and their families. The EpiKG system enhances patient knowledge graphs by adding assertion labels and temporal markers to refine data retrieval based on user queries. In tests involving six major language models, EpiKG demonstrated a significant improvement, achieving a 22 percentage point gain on its main objective, underscoring the complexities of practical clinical information retrieval.

Key facts

ClinicalBench contains 400 questions over 43 MIMIC-IV patients across 9 assertion-sensitive categories.
EpiKG adds assertion labels and temporality tags to each fact in a patient knowledge graph.
Six LLMs were tested: Claude Opus 4.6, GPT-OSS 20B, MedGemma 27B, Gemma 4 31B, MedGemma 1.5 4B, Qwen 3.5 35B.
Three physicians blindly adjudicated 100 paired items.
Two external physicians rated 50 unanimous-strict items for the primary endpoint.
Primary endpoint: +22.0 percentage points (95% Newcombe CI [+5.1, +31.5], p=0.0192).
The study is published as arXiv:2605.11143.
EpiKG routes retrieval by question intent based on assertion and temporality.

EpiKG Boosts Clinical Retrieval by 22 Points in Assertion-Aware QA

Key facts

Entities

Institutions

Sources