Agentic AI Matches Expert Consensus on Myeloma Treatment Decisions
A recent investigation published on arXiv (2604.24473) examines the ability of LLM-based systems to generate intricate longitudinal medical records that align with expert clinical reasoning in multiple myeloma. This retrospective study analyzed data from 811 patients at a tertiary center spanning 2001 to 2026, encompassing 44,962 documents and 1,334,677 laboratory results, alongside external validation using MIMIC-IV. The research compared an agentic reasoning system to single-pass RAG, iterative RAG, and full-context input across 469 patient-question pairs categorized into 48 templates with varying complexity. Reference labels were derived from dual annotations by four oncologists, with final adjudication from a senior haematologist. Findings indicated that iterative RAG and full-context input approached expert consensus, suggesting that agentic reasoning in longitudinal records can significantly support treatment decisions in complex diseases like multiple myeloma.
Key facts
- Study from arXiv (2604.24473) evaluates LLM-based clinical reasoning in multiple myeloma.
- Retrospective analysis of 811 patients from a tertiary centre (2001–2026).
- Dataset includes 44,962 documents and 1,334,677 laboratory values.
- External validation performed on MIMIC-IV dataset.
- Agentic reasoning system compared against single-pass RAG, iterative RAG, and full-context input.
- 469 patient-question pairs from 48 templates at three complexity levels.
- Reference labels from double annotation by four oncologists with senior haematologist adjudication.
- Iterative RAG and full-context input converged on a shared reasoning path approaching expert agreement.
Entities
Institutions
- arXiv
- MIMIC-IV