ARTFEED — Contemporary Art Intelligence

LLM Miscalibration in Social Science Measurement

ai-technology · 2026-05-13

A new paper on arXiv (2605.11954) investigates miscalibration in large language models used for social science measurement. The study examines how confidence scores from models like GPT-5-mini and DeepSeek-V3.2 fail to align with actual correctness across 14 constructs, using a case study on FOMC to show that confidence-based filtering can alter regression estimates. The authors propose soft label distillation as a mitigation strategy.

Key facts

  • arXiv paper 2605.11954 studies miscalibration in LLM-based social science measurement.
  • Case study on FOMC shows confidence filtering changes regression estimates.
  • Audits calibration across 14 social science constructs.
  • Models include GPT-5-mini and DeepSeek-V3.2.
  • Reported confidence poorly aligned with tolerance-based correctness.
  • Proposes soft label distillation pipeline as mitigation.

Entities

Institutions

  • arXiv

Sources