Study Questions LLMs as Human Surrogates in Behavioral Research Experiments

ai-technology · 2026-04-20

A study available on arXiv investigates the effectiveness of large language models (LLMs) as replacements for human subjects in behavioral research. By analyzing LLM outputs alongside human data from a survey focused on accuracy perception, the research employs structured prompts and uniform statistical methods. The results reveal that while LLMs can mimic certain directional trends observed in human responses, the magnitude of effects and moderation patterns differ considerably. This suggests that although LLMs may reflect aggregate belief-updating trends, they fail to align consistently with human-scale effects. The paper, designated as arXiv:2604.15329v1, raises concerns about the reliability of LLM-generated data in experimental settings.

Key facts

Large language models are increasingly used to simulate human responses in behavioral research
The study compares off-the-shelf LLM-generated responses with human responses from a canonical survey experiment on accuracy perception
Each human observation was converted into a structured prompt for LLMs
Models generated a single 0-10 outcome variable without task-specific training
Identical statistical analyses were applied to human and synthetic responses
LLMs reproduce several directional effects observed in humans
Effect magnitudes and moderation patterns vary across different models
Off-the-shelf LLMs capture aggregate belief-updating patterns under controlled conditions but do not consistently match human-scale effects

Study Questions LLMs as Human Surrogates in Behavioral Research Experiments

Key facts

Entities

Institutions

Sources