LLM Survey Simulation: Quantifying Human-LLM Misalignment Uncertainty

ai-technology · 2026-05-22

A new framework from arXiv (2502.17773v5) addresses the challenge of using large language models (LLMs) to simulate human survey responses. The authors develop a method to convert LLM-simulated data into reliable confidence sets for population parameters, explicitly accounting for uncertainty due to human-LLM misalignment. The key innovation is a data-driven approach that adaptively selects the number of simulated responses: too few yield overly wide, uninformative sets, while too many produce narrow sets with poor coverage. The selected sample size quantifies the effective human population size the LLM can represent. This work provides a principled way to use synthetic data for inference while maintaining statistical validity.

Key facts

arXiv paper 2502.17773v5 proposes a framework for LLM survey simulation.
The framework converts simulated responses into confidence sets for human population parameters.
It quantifies uncertainty from human-LLM misalignment.
A data-driven approach adaptively selects simulation sample size.
Too few simulations yield overly wide sets; too many yield narrow sets with poor coverage.
The selected sample size reflects the effective human population size the LLM can represent.
The method works regardless of LLM simulation fidelity or confidence set construction procedure.
The goal is to achieve nominal average-case coverage.

LLM Survey Simulation: Quantifying Human-LLM Misalignment Uncertainty

Key facts

Entities

Institutions

Sources