ARTFEED — Contemporary Art Intelligence

AI Forecasting Errors Are Highly Correlated Across LLMs, Study Finds

ai-technology · 2026-05-06

A recent study released on arXiv (2605.00844) indicates that large language models (LLMs) from various developers display significantly correlated forecasting errors, which jeopardizes the independence required for effective collective intelligence. In Study 1, GPT-4o, Claude, and Gemini demonstrated a mean pairwise error correlation of r = 0.77 (p < 0.001) across 568 resolved binary prediction questions, even after removing potential leaked questions (r = 0.78). Study 2 investigated whether this shared bias affected human crowd predictions, observing shifts in community forecasts following the ChatGPT launch in November 2022. Although community predictions aligned with LLM forecasts (r = 0.20, p = 0.007), the observed shift was attributed to rational updating towards the actual outcome, indicating no bias transfer from AI to human forecasters. These results question the belief that combining multiple LLM forecasts minimizes errors, as their inaccuracies are interdependent.

Key facts

  • Study published on arXiv with ID 2605.00844
  • Tested GPT-4o, Claude, and Gemini on 568 resolved binary prediction questions
  • Mean pairwise error correlation r = 0.77 (p < 0.001)
  • Correlation remained high (r = 0.78) after excluding likely leaked questions
  • Study 2 tracked community prediction shifts across ChatGPT launch (November 2022)
  • Community forecasts moved in LLM-predicted direction (r = 0.20, p = 0.007)
  • Shift fully explained by rational updating toward ground truth
  • No evidence of bias transmission from AI to human forecasters

Entities

Institutions

  • arXiv

Sources