KellyBench: AI Models Lose Money in Sports Betting Benchmark

ai-technology · 2026-05-01

Researchers have introduced KellyBench, a benchmark for evaluating sequential decision-making in long-horizon, non-stationary environments, using sports betting markets as a testbed. The environment simulates the 2023-24 English Premier League season, where agents must maximize long-term bankroll growth using detailed historical data including advanced statistics, lineups, and public odds. All frontier language models evaluated lost money on average over five seeds, with the best performing model achieving an average return of -8% and many experiencing ruin. The benchmark highlights the difficulty of adapting to changing environments and identifying market edge, areas where current models fall short.

Key facts

KellyBench evaluates sequential decision-making in sports betting markets.
The environment simulates the 2023-24 English Premier League season.
Agents must maximize long-term bankroll growth using historical data.
Data includes advanced statistics, lineups, and public odds.
All frontier models evaluated lost money on average over five seeds.
Best performing model achieved an average return of -8%.
Many models experienced ruin across seeds.
The benchmark tests adaptation to non-stationary environments.

Entities

—

Sources

arXiv cs.AI — 2026-05-01