FHIR Serialisation Strategy Significantly Impacts LLM Medication Reconciliation Performance

ai-technology · 2026-04-25

A recent investigation published on arXiv (2604.21076) evaluates four FHIR serialization methods—Raw JSON, Markdown Table, Clinical Narrative, and Chronological Timeline—using five large language models with open weights (Phi-3.5-mini, Mistral-7B, BioMistral-7B, Llama-3.1-8B, Llama-3.3-70B) across a benchmark involving 200 synthetic patients (4,000 inference runs). The analysis reveals that for models with up to 8B parameters, Clinical Narrative surpasses Raw JSON by as much as 19 F1 points (Mistral-7B, r=0.617, p<10^{-10}). Conversely, Raw JSON excels at 70B parameters, reversing the trend. These results underscore the importance of serialization strategy in utilizing LLMs for FHIR-structured clinical data in medication reconciliation.

Key facts

Study compares four FHIR serialisation strategies: Raw JSON, Markdown Table, Clinical Narrative, Chronological Timeline.
Five open-weight models tested: Phi-3.5-mini, Mistral-7B, BioMistral-7B, Llama-3.1-8B, Llama-3.3-70B.
Benchmark uses 200 synthetic patients with 4,000 inference runs.
Clinical Narrative outperforms Raw JSON by up to 19 F1 points for Mistral-7B.
At 70B parameters, Raw JSON achieves best performance.
Serialisation strategy has statistically significant effect on models up to 8B parameters.
Correlation coefficient r=0.617, p<10^{-10} for Mistral-7B comparison.
Medication reconciliation at clinical handoffs is a high-stakes, error-prone process.

FHIR Serialisation Strategy Significantly Impacts LLM Medication Reconciliation Performance

Key facts

Entities

Institutions

Sources