ARTFEED — Contemporary Art Intelligence

DESG: Evaluating AI Therapist Responses Without LLM Judges

ai-technology · 2026-05-07

Researchers propose Dynamic Emotional Signature Graphs (DESG) to evaluate therapeutic response quality in mental-health dialogue systems, addressing the failure of direct LLM judges and text-similarity metrics. The study finds that these conventional methods are poorly aligned with clinical direction—whether a response moves the user toward regulation, leaves them unchanged, or reinforces deterioration. DESG is a model-agnostic evaluator that represents dialogue windows with decoupled clinical features, offering a more reliable offline evaluation for AI therapists. The paper is published on arXiv under ID 2605.03472.

Key facts

  • Conversational AI therapists are increasingly used in psychological support settings.
  • Reliable offline evaluation of therapeutic response quality remains an open problem.
  • The paper studies multi-domain support-dialogue evaluation without relying on LLMs as final judges.
  • Direct LLM judges and symmetric text-similarity metrics are poorly aligned with therapeutic quality.
  • The target label depends on clinical direction: regulation, reframing, unchanged, or deterioration.
  • DESG represents dialogue windows with decoupled clinical features.
  • DESG is a model-agnostic evaluator.
  • The paper is on arXiv:2605.03472.

Entities

Institutions

  • arXiv

Sources