ARTFEED — Contemporary Art Intelligence

Cognitive User Simulator Enhances Proactive Task-Oriented Dialogue

ai-technology · 2026-05-23

A new paper on arXiv (2605.22240) introduces the Cognitive User Simulator, a framework that models users as stratified personas with observable traits and hidden concerns to improve proactive task-oriented dialogue (TOD). The authors argue that post-trained LLMs are inherently conservative and that reward-shaping RL methods like GRPO fail because they only re-weight passive policy samples. By conditioning on latent user concerns, the simulator enables proactive capability that sampling alone cannot achieve. The simulator generates faithful, diverse interactions and emits per-turn state dynamics tracking persuasion progress. The paper also proposes Simulator-Induced Asymmetric-View Policy Learning to leverage this signal. The work targets applications like outbound sales, where agents must steer conversations toward acceptance within a bounded number of turns.

Key facts

  • arXiv paper 2605.22240 proposes Cognitive User Simulator for proactive TOD
  • Post-trained LLMs are inherently conservative in proactive tasks
  • GRPO struggles because it re-weights passive policy samples
  • Latent user concerns are pivotal training-time signals for proactivity
  • Simulator models users as stratified personas with external traits and internal concerns
  • Simulator produces faithful, diverse interactions with per-turn state dynamics
  • Simulator-Induced Asymmetric-View Policy Learning is introduced
  • Target application is outbound sales with bounded turn acceptance

Entities

Institutions

  • arXiv

Sources