LLMs Outperform Mental Health Professionals in Diagnosing Personality Disorders

ai-technology · 2026-05-07

A recent study available on arXiv evaluates the diagnostic precision of large language models (LLMs) against that of mental health experts in identifying Borderline (BPD) and Narcissistic (NPD) Personality Disorders through first-person narratives in Polish. The Gemini Pro models achieved a diagnostic score of 65.48%, exceeding the average human score of 43.57% by 21.91 percentage points. While both LLMs and human evaluators were proficient in recognizing BPD (F1 = 83.4 for models and 80.0 for humans), the models significantly underreported NPD (F1 = 6.7 compared to 50.0), indicating a hesitation in using the term "narcissism." Models provided detailed justifications based on patterns, whereas human professionals offered more nuanced evaluations, raising concerns about LLMs' reliability in psychiatric self-assessment.

Key facts

Study compares LLMs and mental health professionals on diagnosing BPD and NPD
Uses Polish-language first-person autobiographical accounts
Top Gemini Pro models scored 65.48%, humans 43.57%
Both models and humans excelled at BPD (F1 = 83.4 vs. 80.0)
Models underdiagnosed NPD (F1 = 6.7 vs. 50.0)
Models showed potential reluctance toward the term 'narcissism'
Models provided confident, pattern-focused justifications
Published on arXiv with identifier 2512.20298

LLMs Outperform Mental Health Professionals in Diagnosing Personality Disorders

Key facts

Entities

Institutions

Sources