Cognitive User Simulator Enhances Proactive Task-Oriented Dialogue

ai-technology · 2026-05-23

A new paper on arXiv (2605.22240) introduces the Cognitive User Simulator, a framework that models users as stratified personas with observable traits and hidden concerns to improve proactive task-oriented dialogue (TOD). The authors argue that post-trained LLMs are inherently conservative and that reward-shaping RL methods like GRPO fail because they only re-weight passive policy samples. By conditioning on latent user concerns, the simulator enables proactive capability that sampling alone cannot achieve. The simulator generates faithful, diverse interactions and emits per-turn state dynamics tracking persuasion progress. The paper also proposes Simulator-Induced Asymmetric-View Policy Learning to leverage this signal. The work targets applications like outbound sales, where agents must steer conversations toward acceptance within a bounded number of turns.

Key facts

arXiv paper 2605.22240 proposes Cognitive User Simulator for proactive TOD
Post-trained LLMs are inherently conservative in proactive tasks
GRPO struggles because it re-weights passive policy samples
Latent user concerns are pivotal training-time signals for proactivity
Simulator models users as stratified personas with external traits and internal concerns
Simulator produces faithful, diverse interactions with per-turn state dynamics
Simulator-Induced Asymmetric-View Policy Learning is introduced
Target application is outbound sales with bounded turn acceptance

Cognitive User Simulator Enhances Proactive Task-Oriented Dialogue

Key facts

Entities

Institutions

Sources