Cognitive User Simulator Enhances Proactive Task-Oriented Dialogue
A new paper on arXiv (2605.22240) introduces the Cognitive User Simulator, a framework that models users as stratified personas with observable traits and hidden concerns to improve proactive task-oriented dialogue (TOD). The authors argue that post-trained LLMs are inherently conservative and that reward-shaping RL methods like GRPO fail because they only re-weight passive policy samples. By conditioning on latent user concerns, the simulator enables proactive capability that sampling alone cannot achieve. The simulator generates faithful, diverse interactions and emits per-turn state dynamics tracking persuasion progress. The paper also proposes Simulator-Induced Asymmetric-View Policy Learning to leverage this signal. The work targets applications like outbound sales, where agents must steer conversations toward acceptance within a bounded number of turns.
Key facts
- arXiv paper 2605.22240 proposes Cognitive User Simulator for proactive TOD
- Post-trained LLMs are inherently conservative in proactive tasks
- GRPO struggles because it re-weights passive policy samples
- Latent user concerns are pivotal training-time signals for proactivity
- Simulator models users as stratified personas with external traits and internal concerns
- Simulator produces faithful, diverse interactions with per-turn state dynamics
- Simulator-Induced Asymmetric-View Policy Learning is introduced
- Target application is outbound sales with bounded turn acceptance
Entities
Institutions
- arXiv