ARTFEED — Contemporary Art Intelligence

Study Questions LLMs as Human Surrogates in Behavioral Research Experiments

ai-technology · 2026-04-20

A study available on arXiv investigates the effectiveness of large language models (LLMs) as replacements for human subjects in behavioral research. By analyzing LLM outputs alongside human data from a survey focused on accuracy perception, the research employs structured prompts and uniform statistical methods. The results reveal that while LLMs can mimic certain directional trends observed in human responses, the magnitude of effects and moderation patterns differ considerably. This suggests that although LLMs may reflect aggregate belief-updating trends, they fail to align consistently with human-scale effects. The paper, designated as arXiv:2604.15329v1, raises concerns about the reliability of LLM-generated data in experimental settings.

Key facts

  • Large language models are increasingly used to simulate human responses in behavioral research
  • The study compares off-the-shelf LLM-generated responses with human responses from a canonical survey experiment on accuracy perception
  • Each human observation was converted into a structured prompt for LLMs
  • Models generated a single 0-10 outcome variable without task-specific training
  • Identical statistical analyses were applied to human and synthetic responses
  • LLMs reproduce several directional effects observed in humans
  • Effect magnitudes and moderation patterns vary across different models
  • Off-the-shelf LLMs capture aggregate belief-updating patterns under controlled conditions but do not consistently match human-scale effects

Entities

Institutions

  • arXiv

Sources