Multi-Agent AI Oracles Improve Prediction Market Resolution Accuracy

ai-technology · 2026-06-01

A recent study published on arXiv investigates the potential of multi-agent LLM architectures to enhance oracle resolution accuracy in prediction markets compared to single-model benchmarks. The team assessed independent aggregation and deliberative consensus against single-LLM models, including GPT-5 Nano, DeepSeek V3, and Llama-3.3-70B, utilizing 1,189 resolved questions from KalshiBench. All agents accessed a shared evidence layer via Exa, with retrieval processes filtered by publication date to distinguish reasoning from retrieval quality. The highest accuracy, at 83.43 percent, was achieved through independent aggregation with confidence-weighted voting. This research underscores the balance between swift yet fragile automation and precise but expensive human arbitration in current oracle systems, indicating that multi-agent setups can self-correct and surpass single-LLM oracles.

Key facts

Study evaluates multi-agent AI oracle systems for prediction market resolution
Compared independent aggregation and deliberative consensus against single-LLM baselines
Baselines include GPT-5 Nano, DeepSeek V3, and Llama-3.3-70B
Tested on 1,189 resolved prediction market questions from KalshiBench
All agents share common evidence layer through Exa
Retrieval filtered by publication date to isolate reasoning from retrieval quality
Independent aggregation with confidence-weighted voting achieves highest accuracy at 83.43 percent
Existing oracle systems trade off fast but brittle automation against accurate but costly human arbitration

Multi-Agent AI Oracles Improve Prediction Market Resolution Accuracy

Key facts

Entities

Institutions

Sources