BLF System Achieves State-of-the-Art Performance on ForecastBench with Bayesian Linguistic Forecasting

ai-technology · 2026-04-22

The BLF (Bayesian Linguistic Forecaster) is a novel agentic system that has shown exceptional results on the ForecastBench benchmark for binary forecasting. It surpasses all leading public methods, such as Cassi and GPT-5, across 400 backtesting questions from the ForecastBench leaderboard. BLF incorporates three groundbreaking techniques: a linguistic belief state that merges numerical probability assessments with summaries of natural-language evidence, hierarchical multi-trial aggregation utilizing K independent trials with logit-space shrinkage, and hierarchical calibration via Platt scaling with a hierarchical prior. This system, identified as arXiv:2604.18576v2, avoids the typical method of continuously adding retrieved evidence to an expanding context. Its hierarchical calibration specifically mitigates over-shrinking of extreme predictions from sources with skewed base rates, updating its linguistic belief state at every step of an iterative tool-use loop.

Key facts

BLF (Bayesian Linguistic Forecaster) achieves state-of-the-art performance on ForecastBench benchmark
System outperforms all top public methods including Cassi and GPT-5
Tested on 400 backtesting questions from ForecastBench leaderboard
Uses linguistic belief state combining numerical probability estimates with natural-language evidence summaries
Implements hierarchical multi-trial aggregation with K independent trials
Employs hierarchical calibration through Platt scaling with hierarchical prior
Avoids common approach of appending all evidence to ever-growing context
Prevents over-shrinking extreme predictions for sources with skewed base rates

BLF System Achieves State-of-the-Art Performance on ForecastBench with Bayesian Linguistic Forecasting

Key facts

Entities

Institutions

Sources