LEAF: A Living Benchmark for Event-Augmented Forecasting with LLMs

ai-technology · 2026-05-20

Researchers introduced LEAF, the first living benchmark for event-augmented forecasting tasks, designed to evaluate large language models (LLMs) in complex, real-world scenarios. LEAF addresses limitations of existing benchmarks that lack multidimensional events or focus on closed environments. It uses a recursive retrieval agent system with dual-agent cross-validation to provide relevant auxiliary text for forecasting future event probabilities, trends, and time series. Evaluations of state-of-the-art proprietary and open-weight LLMs showed that these models can leverage signals from complex events to enhance predictive performance, particularly in the stock domain.

Key facts

LEAF is the first living benchmark for event-augmented forecasting.
It evaluates LLMs on future event probabilities, trend and time series forecasting.
Uses a recursive retrieval agent system with dual-agent cross-validation.
Evaluates state-of-the-art proprietary and open-weight LLMs.
LLMs can leverage signals from complex events to enhance predictive performance.
Focuses on stock domain as one application.
Published on arXiv with ID 2605.16358.
Addresses data scarcity and closed environment issues in existing benchmarks.

LEAF: A Living Benchmark for Event-Augmented Forecasting with LLMs

Key facts

Entities

Institutions

Sources