Agentic AI Matches Expert Consensus on Myeloma Treatment Decisions

ai-technology · 2026-04-29

A recent investigation published on arXiv (2604.24473) examines the ability of LLM-based systems to generate intricate longitudinal medical records that align with expert clinical reasoning in multiple myeloma. This retrospective study analyzed data from 811 patients at a tertiary center spanning 2001 to 2026, encompassing 44,962 documents and 1,334,677 laboratory results, alongside external validation using MIMIC-IV. The research compared an agentic reasoning system to single-pass RAG, iterative RAG, and full-context input across 469 patient-question pairs categorized into 48 templates with varying complexity. Reference labels were derived from dual annotations by four oncologists, with final adjudication from a senior haematologist. Findings indicated that iterative RAG and full-context input approached expert consensus, suggesting that agentic reasoning in longitudinal records can significantly support treatment decisions in complex diseases like multiple myeloma.

Key facts

Study from arXiv (2604.24473) evaluates LLM-based clinical reasoning in multiple myeloma.
Retrospective analysis of 811 patients from a tertiary centre (2001–2026).
Dataset includes 44,962 documents and 1,334,677 laboratory values.
External validation performed on MIMIC-IV dataset.
Agentic reasoning system compared against single-pass RAG, iterative RAG, and full-context input.
469 patient-question pairs from 48 templates at three complexity levels.
Reference labels from double annotation by four oncologists with senior haematologist adjudication.
Iterative RAG and full-context input converged on a shared reasoning path approaching expert agreement.

Agentic AI Matches Expert Consensus on Myeloma Treatment Decisions

Key facts

Entities

Institutions

Sources