ARTFEED — Contemporary Art Intelligence

Inverse Scaling: More Capable LLMs Produce Worse Forecasts on Superlinear Growth

ai-technology · 2026-05-23

A recent study published on arXiv (2605.22672) indicates that advanced language models tend to perform poorly in distributional predictions for scenarios characterized by superlinear growth and the potential for regime shifts, a situation often seen in finance and epidemiology. The authors present ForecastBench-Sim (FBSim), a benchmark designed for contamination-free simulations, and illustrate this phenomenon using synthetic SIR epidemic models alongside a corresponding linear control. The shortcomings are primarily observed in the upper tail, which more sophisticated models elevate to accommodate aggressive projections, while the lower tail remains stable. This trend is also evident in actual datasets concerning COVID-19, measles, real estate, and hyperinflation. An analysis of Llama-3.1 reveals that both model size and post-training factors contribute to this inverse scaling, with domain expertise failing to enhance calibration reliably.

Key facts

  • Inverse scaling in LLMs on forecasting problems with superlinear growth and tail risk
  • ForecastBench-Sim (FBSim) released as a contamination-free benchmark
  • Failure concentrates at the upper tail of distributional forecasts
  • Replicates on COVID-19, measles, housing markets, and hyperinflation datasets
  • Llama-3.1 study shows scale and post-training both contribute to the effect
  • Domain knowledge does not reliably rescue calibration

Entities

Institutions

  • arXiv

Sources