15 AI Chatbots Evaluated for Psychiatric Triage Accuracy
A recent study published on arXiv (2604.25415) evaluated 15 advanced AI chatbots in the context of psychiatric triage, utilizing 112 clinical vignettes. Each vignette presented a realistic single-message disclosure, which was assigned one of four triage labels: A (routine), B (assessment within 1 week), C (assessment within 24–48 hours), or D (emergency care now). The vignettes encompassed 9 clusters of psychiatric presentations and 9 specific risk dimensions, categorized into 28 groups, with 4 unique vignettes for each triage level. The chatbots were challenged to determine the appropriate triage label. This research underscores the difficulties faced in AI-driven psychiatric triage, where urgency must be deduced from subjective indicators rather than concrete evidence.
Key facts
- Study evaluated 15 frontier AI chatbots
- Used 112 clinical vignettes
- Four triage labels: A (routine), B (within 1 week), C (24-48 hours), D (emergency now)
- Vignettes covered 9 psychiatric presentation clusters
- Vignettes covered 9 focal risk dimensions
- 28 presentation-by-risk groups
- Each group had 4 distinct vignettes
- Chatbots tasked with assigning triage label
Entities
Institutions
- arXiv