ARTFEED — Contemporary Art Intelligence

LEAF: A Living Benchmark for Event-Augmented Forecasting with LLMs

ai-technology · 2026-05-20

Researchers introduced LEAF, the first living benchmark for event-augmented forecasting tasks, designed to evaluate large language models (LLMs) in complex, real-world scenarios. LEAF addresses limitations of existing benchmarks that lack multidimensional events or focus on closed environments. It uses a recursive retrieval agent system with dual-agent cross-validation to provide relevant auxiliary text for forecasting future event probabilities, trends, and time series. Evaluations of state-of-the-art proprietary and open-weight LLMs showed that these models can leverage signals from complex events to enhance predictive performance, particularly in the stock domain.

Key facts

  • LEAF is the first living benchmark for event-augmented forecasting.
  • It evaluates LLMs on future event probabilities, trend and time series forecasting.
  • Uses a recursive retrieval agent system with dual-agent cross-validation.
  • Evaluates state-of-the-art proprietary and open-weight LLMs.
  • LLMs can leverage signals from complex events to enhance predictive performance.
  • Focuses on stock domain as one application.
  • Published on arXiv with ID 2605.16358.
  • Addresses data scarcity and closed environment issues in existing benchmarks.

Entities

Institutions

  • arXiv

Sources