ARTFEED — Contemporary Art Intelligence

Dr-CiK Benchmark Tests AI Agents on Real-World Forecasting

ai-technology · 2026-05-28

Researchers have introduced Dr-CiK, a new benchmark designed to evaluate whether AI agents can autonomously retrieve and use supporting context for time series forecasting. Unlike existing benchmarks that assume context is already provided, Dr-CiK requires agents to search a document corpus, filter out irrelevant information, distill useful evidence, and generate forecasts. Tests combining state-of-the-art deep research and forecasting methods show that high-quality context significantly improves performance, but most agents recover only a small fraction of the ground-truth context. The benchmark aims to bridge the gap between controlled forecasting tasks and real-world scenarios where context must be actively discovered from noisy, heterogeneous sources.

Key facts

  • Dr-CiK is a benchmark for evaluating context-aided forecasting agents.
  • It requires agents to retrieve, filter, distill, and use context from a document corpus.
  • Existing benchmarks assume supporting context is already provided.
  • High-quality context substantially improves forecasting performance in Dr-CiK.
  • Most deep research agents recover only a small fraction of ground-truth context.
  • The benchmark addresses real-world forecasting where context must be actively discovered.
  • State-of-the-art deep research and forecasting methods were evaluated.
  • The study was published on arXiv with ID 2605.27904.

Entities

Institutions

  • arXiv

Sources