LEAF: A Living Benchmark for Event-Augmented Forecasting with LLMs
Researchers introduced LEAF, the first living benchmark for event-augmented forecasting tasks, designed to evaluate large language models (LLMs) in complex, real-world scenarios. LEAF addresses limitations of existing benchmarks that lack multidimensional events or focus on closed environments. It uses a recursive retrieval agent system with dual-agent cross-validation to provide relevant auxiliary text for forecasting future event probabilities, trends, and time series. Evaluations of state-of-the-art proprietary and open-weight LLMs showed that these models can leverage signals from complex events to enhance predictive performance, particularly in the stock domain.
Key facts
- LEAF is the first living benchmark for event-augmented forecasting.
- It evaluates LLMs on future event probabilities, trend and time series forecasting.
- Uses a recursive retrieval agent system with dual-agent cross-validation.
- Evaluates state-of-the-art proprietary and open-weight LLMs.
- LLMs can leverage signals from complex events to enhance predictive performance.
- Focuses on stock domain as one application.
- Published on arXiv with ID 2605.16358.
- Addresses data scarcity and closed environment issues in existing benchmarks.
Entities
Institutions
- arXiv