ARTFEED — Contemporary Art Intelligence

PSI-Bench: New Framework Evaluates Depression Patient Simulators

other · 2026-04-30

A team of researchers has unveiled PSI-Bench, an automated evaluation system designed for the clinically relevant and interpretable assessment of depression patient simulators. This framework seeks to overcome the shortcomings of existing evaluations that depend on LLM-judges with vague prompts and do not effectively measure behavioral diversity. PSI-Bench offers diagnostics at the turn-, dialogue-, and population-level. In benchmarking seven LLMs across two simulator frameworks, findings indicated that simulators tend to generate excessively lengthy and lexically varied responses, exhibit diminished variability, resolve emotions too swiftly, and adhere to a consistent negative trajectory. The initiative aims to enhance mental health training by facilitating more authentic and varied patient simulations.

Key facts

  • PSI-Bench is an automatic evaluation framework for depression patient simulators.
  • It provides interpretable, clinically grounded diagnostics.
  • Evaluation covers turn-, dialogue-, and population-level dimensions.
  • Seven LLMs were benchmarked across two simulator frameworks.
  • Simulators produce overly long and lexically diverse responses.
  • Simulators show reduced variability in behavior.
  • Emotions are resolved too quickly in simulations.
  • Simulators follow a uniform negative trajectory.

Entities

Sources