PSI-Bench: New Framework Evaluates Depression Patient Simulators

other · 2026-04-30

A team of researchers has unveiled PSI-Bench, an automated evaluation system designed for the clinically relevant and interpretable assessment of depression patient simulators. This framework seeks to overcome the shortcomings of existing evaluations that depend on LLM-judges with vague prompts and do not effectively measure behavioral diversity. PSI-Bench offers diagnostics at the turn-, dialogue-, and population-level. In benchmarking seven LLMs across two simulator frameworks, findings indicated that simulators tend to generate excessively lengthy and lexically varied responses, exhibit diminished variability, resolve emotions too swiftly, and adhere to a consistent negative trajectory. The initiative aims to enhance mental health training by facilitating more authentic and varied patient simulations.

Key facts

PSI-Bench is an automatic evaluation framework for depression patient simulators.
It provides interpretable, clinically grounded diagnostics.
Evaluation covers turn-, dialogue-, and population-level dimensions.
Seven LLMs were benchmarked across two simulator frameworks.
Simulators produce overly long and lexically diverse responses.
Simulators show reduced variability in behavior.
Emotions are resolved too quickly in simulations.
Simulators follow a uniform negative trajectory.

Entities

—

Sources

arXiv cs.AI — 2026-04-29