Dr-CiK Benchmark Tests AI Agents on Real-World Forecasting

ai-technology · 2026-05-28

Researchers have introduced Dr-CiK, a new benchmark designed to evaluate whether AI agents can autonomously retrieve and use supporting context for time series forecasting. Unlike existing benchmarks that assume context is already provided, Dr-CiK requires agents to search a document corpus, filter out irrelevant information, distill useful evidence, and generate forecasts. Tests combining state-of-the-art deep research and forecasting methods show that high-quality context significantly improves performance, but most agents recover only a small fraction of the ground-truth context. The benchmark aims to bridge the gap between controlled forecasting tasks and real-world scenarios where context must be actively discovered from noisy, heterogeneous sources.

Key facts

Dr-CiK is a benchmark for evaluating context-aided forecasting agents.
It requires agents to retrieve, filter, distill, and use context from a document corpus.
Existing benchmarks assume supporting context is already provided.
High-quality context substantially improves forecasting performance in Dr-CiK.
Most deep research agents recover only a small fraction of ground-truth context.
The benchmark addresses real-world forecasting where context must be actively discovered.
State-of-the-art deep research and forecasting methods were evaluated.
The study was published on arXiv with ID 2605.27904.

Dr-CiK Benchmark Tests AI Agents on Real-World Forecasting

Key facts

Entities

Institutions

Sources