DESG: Evaluating AI Therapist Responses Without LLM Judges

ai-technology · 2026-05-07

Researchers propose Dynamic Emotional Signature Graphs (DESG) to evaluate therapeutic response quality in mental-health dialogue systems, addressing the failure of direct LLM judges and text-similarity metrics. The study finds that these conventional methods are poorly aligned with clinical direction—whether a response moves the user toward regulation, leaves them unchanged, or reinforces deterioration. DESG is a model-agnostic evaluator that represents dialogue windows with decoupled clinical features, offering a more reliable offline evaluation for AI therapists. The paper is published on arXiv under ID 2605.03472.

Key facts

Conversational AI therapists are increasingly used in psychological support settings.
Reliable offline evaluation of therapeutic response quality remains an open problem.
The paper studies multi-domain support-dialogue evaluation without relying on LLMs as final judges.
Direct LLM judges and symmetric text-similarity metrics are poorly aligned with therapeutic quality.
The target label depends on clinical direction: regulation, reframing, unchanged, or deterioration.
DESG represents dialogue windows with decoupled clinical features.
DESG is a model-agnostic evaluator.
The paper is on arXiv:2605.03472.

DESG: Evaluating AI Therapist Responses Without LLM Judges

Key facts

Entities

Institutions

Sources