ARTFEED — Contemporary Art Intelligence

Research warns of analytic flexibility risks in using LLMs as human data substitutes

ai-technology · 2026-04-20

Social scientists increasingly employ large language models to generate synthetic datasets known as silicon samples, intended to replace human respondents in research. A new study examines how numerous analytic decisions impact the correspondence between these artificial samples and actual human data. Across two investigations, the research evaluated 252 distinct silicon-sample configurations for a controlled case study involving two social-psychological scales. These configurations assessed whether synthetic data could accurately recover participant rankings, response distributions, and correlations between scales. Substantial variation was found across all three evaluation criteria, with configurations performing well on one dimension often failing on others. The analysis was extended to a published real-world application of silicon samples. Key choices influencing outcomes include model selection, sampling parameters, prompt formatting, and the inclusion of demographic or contextual details. This work highlights methodological challenges in using AI to simulate human responses for social science research.

Key facts

  • Social scientists use large language models to create synthetic datasets called silicon samples.
  • The study examines how analytic choices affect correspondence between silicon samples and human data.
  • 252 silicon-sample configurations were generated for a controlled case study.
  • Two social-psychological scales were used in the evaluation.
  • Configurations were assessed on recovering participant rankings, response distributions, and between-scale correlations.
  • Substantial variation was found across all three evaluation criteria.
  • Configurations that performed well on one dimension often performed poorly on another.
  • The analysis was extended to a published silicon-sample use case.

Entities

Institutions

  • arXiv

Sources