RAG-Coding: LLM Agent System Improves Medical Code Accuracy
A new approach called RAG-Coding has been developed by researchers for automated ICD-10-CM coding, utilizing four large language model (LLM) agents that anchor coding choices in external resources, such as the official coding tabular list and guidelines. This method improves coding precision and maintains clinical compliance by retrieving and cross-referencing pertinent information. In tests on the MDACE dataset, RAG-Coding surpassed the top LLM-based baseline by 8-13% in micro-F1 and 2-8% in macro-F1 across various LLM frameworks. While RAG-Coding achieved an 11% increase in micro recall compared to the leading pretrained model PLM-ICD, the latter excelled in micro precision by 6%, resulting in similar micro- and macro-F1 scores. Additionally, ablation studies reveal incremental improvements, emphasizing the value of external knowledge integration. The researchers also launched MDACE-2025, an enhanced version of the original dataset featuring expert re-annotations.
Key facts
- RAG-Coding uses four LLM agents for ICD-10-CM coding.
- Agents ground decisions in external knowledge sources: official coding tabular list and guidelines.
- On MDACE dataset, RAG-Coding outperforms best LLM baseline by 8-13% micro-F1 and 2-8% macro-F1.
- Compared to PLM-ICD, RAG-Coding has higher micro recall (+11%), PLM-ICD higher micro precision (+6%).
- Ablation studies show stepwise gains from incorporating external knowledge.
- MDACE-2025 dataset released with expert re-annotations.
- Method ensures clinical compliance through cross-referencing.
- RAG-Coding is an agentic method for automated medical coding.
Entities
—