PSI-Bench: New Framework Evaluates Depression Patient Simulators
A team of researchers has unveiled PSI-Bench, an automated evaluation system designed for the clinically relevant and interpretable assessment of depression patient simulators. This framework seeks to overcome the shortcomings of existing evaluations that depend on LLM-judges with vague prompts and do not effectively measure behavioral diversity. PSI-Bench offers diagnostics at the turn-, dialogue-, and population-level. In benchmarking seven LLMs across two simulator frameworks, findings indicated that simulators tend to generate excessively lengthy and lexically varied responses, exhibit diminished variability, resolve emotions too swiftly, and adhere to a consistent negative trajectory. The initiative aims to enhance mental health training by facilitating more authentic and varied patient simulations.
Key facts
- PSI-Bench is an automatic evaluation framework for depression patient simulators.
- It provides interpretable, clinically grounded diagnostics.
- Evaluation covers turn-, dialogue-, and population-level dimensions.
- Seven LLMs were benchmarked across two simulator frameworks.
- Simulators produce overly long and lexically diverse responses.
- Simulators show reduced variability in behavior.
- Emotions are resolved too quickly in simulations.
- Simulators follow a uniform negative trajectory.
Entities
—