Automated ICD Classification of Psychiatric Diagnoses Using NLP and LLMs

other · 2026-05-22

This study is focused on improving how psychiatric diagnoses are made by using automation. It does this by matching free-text descriptions to the International Classification of Diseases (ICD) with the help of Natural Language Processing (NLP) and Machine Learning (ML). Researchers worked with a special dataset of 145,513 psychiatric descriptions in Spanish. They tested various text representation methods, from basic frequency-based models like Bag of Words (BoW) and TF-IDF to more advanced Large Language Models (LLMs) such as e5_large, BioLORD, and Llama-3-8B. Transformer-based embeddings outperformed traditional methods, capturing intricate medical language better. The e5_large model, after extensive fine-tuning, achieved the highest F1_micro score of 0.866.

Key facts

Dataset of 145,513 Spanish psychiatric descriptions used
Models evaluated: BoW, TF-IDF, e5_large, BioLORD, Llama-3-8B
Transformer-based embeddings outperformed traditional methods
e5_large achieved highest F1_micro score of 0.866
Study addresses administrative burden in coding clinical diagnoses
Focus on mapping free-text to ICD codes
Research demonstrates potential of LLMs in psychiatric diagnostics

Entities

—

Sources

arXiv cs.AI — 2026-05-21