ARTFEED — Contemporary Art Intelligence

RealUserSim: Grounded User Simulation Boosts Agent Benchmarking Fidelity

ai-technology · 2026-05-22

RealUserSim is a novel framework designed to enhance the accuracy of user simulations based on LLMs for agent evaluation. Conventional simulated users face limitations due to a Formalism Ceiling, achieving only 6-8% style match rates compared to real individuals, alongside Directive Amplification that produces exaggerated behaviors. By utilizing genuine behavioral data, RealUserSim derives 7,275 executable profiles from more than 14,000 real human-LLM interactions within the WildChat dataset. A fidelity benchmark (PT3) assessed 600 conversations across 71+ domains, revealing that grounded simulations improve match rates from 24.2% to 45.3% across five behavioral dimensions. Significant advancements in realism were observed in agent evaluations on TauBench with six simulator models. This research is available on arXiv, ID 2605.20204.

Key facts

  • RealUserSim is the first user simulation framework grounded in real behavioral data.
  • Traditional LLM simulators have a Formalism Ceiling of 6-8% style match rates.
  • Directive Amplification causes unnatural behavioral extremes in hand-crafted simulations.
  • 7,275 executable behavioral profiles were extracted from 14,000+ WildChat conversations.
  • PT3 benchmark evaluates fidelity on 600 conversations across 71+ domains.
  • Grounded simulation raises match rate from 24.2% to 45.3%.
  • Agent evaluation uses TauBench with six simulator models.
  • Anti-leakage controls are implemented in the fidelity benchmark.

Entities

Institutions

  • arXiv
  • WildChat
  • TauBench

Sources