FutureSim: AI Agents Tested on Real-World Event Forecasting
Researchers have developed FutureSim, a simulation framework that replays real-world events chronologically to evaluate how AI agents adapt to new information. The system presents agents with news articles and questions that resolve over a three-month period from January to March 2026, testing their ability to forecast world events beyond their knowledge cutoff. In evaluations, the best-performing agent achieved only 25% accuracy, and many agents performed worse than making no prediction at all, as measured by Brier skill score. The study highlights a clear separation in adaptive capabilities among frontier AI agents and demonstrates FutureSim's utility for studying emerging research in adaptive AI. The work is detailed in a paper on arXiv (ID: 2605.15188).
Key facts
- FutureSim replays real-world events in chronological order to test AI agents.
- Agents forecast events beyond their knowledge cutoff using real news articles.
- Evaluation period: January to March 2026.
- Best agent accuracy: 25%.
- Many agents had worse Brier skill score than no prediction.
- Study reveals clear separation in adaptive capabilities.
- Paper available on arXiv with ID 2605.15188.
- FutureSim offers a realistic setting for studying adaptive AI.
Entities
Institutions
- arXiv