MedSyn: LLM Dialogue Boosts Emergency Diagnostic Accuracy

ai-technology · 2026-05-12

A recent study published on arXiv (2605.08533) presents MedSyn, a system that allows doctors to query a language model iteratively using complete clinical records, starting with just the chief complaint. In this study, seven physicians, including three senior doctors and four residents, participated in both baseline and AI-assisted sessions involving 52 MIMIC-IV cases. The accuracy of residents in challenging cases improved from 0.589 to 0.734, with standardized rates of complete correctness indicating a medium effect (Δ = 0.092; p = 0.071; d = 0.47). Automated assessments also showed improvements, with standardized any-match accuracy increasing by 0.156 (p < 0.0001), and residents achieving the highest F1 score increase (Δ = 0.138; p < 0.0001). Analysis of dialogues indicated that seniors employed more focused questioning techniques.

Key facts

MedSyn lets physicians iteratively query an LLM with full clinical records while initially viewing only the chief complaint.
Seven physicians (three seniors, four residents) completed baseline and AI-assisted sessions across 52 MIMIC-IV cases.
Residents' Hard-case correctness rose from 0.589 to 0.734.
Standardized completely-correct rates showed a medium effect (Δ = 0.092; p = 0.071; d = 0.47).
Standardized any-match accuracy improved by 0.156 (p < 0.0001).
Residents showed the largest F1 gain (Δ = 0.138; p < 0.0001).
Dialogue analysis revealed expertise-dependent strategies.
Seniors asked more targeted questions.

MedSyn: LLM Dialogue Boosts Emergency Diagnostic Accuracy

Key facts

Entities

Institutions

Sources