RealUserSim: Grounded User Simulation Boosts Agent Benchmarking Fidelity

ai-technology · 2026-05-22

RealUserSim is a novel framework designed to enhance the accuracy of user simulations based on LLMs for agent evaluation. Conventional simulated users face limitations due to a Formalism Ceiling, achieving only 6-8% style match rates compared to real individuals, alongside Directive Amplification that produces exaggerated behaviors. By utilizing genuine behavioral data, RealUserSim derives 7,275 executable profiles from more than 14,000 real human-LLM interactions within the WildChat dataset. A fidelity benchmark (PT3) assessed 600 conversations across 71+ domains, revealing that grounded simulations improve match rates from 24.2% to 45.3% across five behavioral dimensions. Significant advancements in realism were observed in agent evaluations on TauBench with six simulator models. This research is available on arXiv, ID 2605.20204.

Key facts

RealUserSim is the first user simulation framework grounded in real behavioral data.
Traditional LLM simulators have a Formalism Ceiling of 6-8% style match rates.
Directive Amplification causes unnatural behavioral extremes in hand-crafted simulations.
7,275 executable behavioral profiles were extracted from 14,000+ WildChat conversations.
PT3 benchmark evaluates fidelity on 600 conversations across 71+ domains.
Grounded simulation raises match rate from 24.2% to 45.3%.
Agent evaluation uses TauBench with six simulator models.
Anti-leakage controls are implemented in the fidelity benchmark.

RealUserSim: Grounded User Simulation Boosts Agent Benchmarking Fidelity

Key facts

Entities

Institutions

Sources