AI Forecasting Errors Are Highly Correlated Across LLMs, Study Finds

ai-technology · 2026-05-06

A recent study released on arXiv (2605.00844) indicates that large language models (LLMs) from various developers display significantly correlated forecasting errors, which jeopardizes the independence required for effective collective intelligence. In Study 1, GPT-4o, Claude, and Gemini demonstrated a mean pairwise error correlation of r = 0.77 (p < 0.001) across 568 resolved binary prediction questions, even after removing potential leaked questions (r = 0.78). Study 2 investigated whether this shared bias affected human crowd predictions, observing shifts in community forecasts following the ChatGPT launch in November 2022. Although community predictions aligned with LLM forecasts (r = 0.20, p = 0.007), the observed shift was attributed to rational updating towards the actual outcome, indicating no bias transfer from AI to human forecasters. These results question the belief that combining multiple LLM forecasts minimizes errors, as their inaccuracies are interdependent.

Key facts

Study published on arXiv with ID 2605.00844
Tested GPT-4o, Claude, and Gemini on 568 resolved binary prediction questions
Mean pairwise error correlation r = 0.77 (p < 0.001)
Correlation remained high (r = 0.78) after excluding likely leaked questions
Study 2 tracked community prediction shifts across ChatGPT launch (November 2022)
Community forecasts moved in LLM-predicted direction (r = 0.20, p = 0.007)
Shift fully explained by rational updating toward ground truth
No evidence of bias transmission from AI to human forecasters

AI Forecasting Errors Are Highly Correlated Across LLMs, Study Finds

Key facts

Entities

Institutions

Sources