ARTFEED — Contemporary Art Intelligence

Multi-Agent AI Oracles Improve Prediction Market Resolution Accuracy

ai-technology · 2026-06-01

A recent study published on arXiv investigates the potential of multi-agent LLM architectures to enhance oracle resolution accuracy in prediction markets compared to single-model benchmarks. The team assessed independent aggregation and deliberative consensus against single-LLM models, including GPT-5 Nano, DeepSeek V3, and Llama-3.3-70B, utilizing 1,189 resolved questions from KalshiBench. All agents accessed a shared evidence layer via Exa, with retrieval processes filtered by publication date to distinguish reasoning from retrieval quality. The highest accuracy, at 83.43 percent, was achieved through independent aggregation with confidence-weighted voting. This research underscores the balance between swift yet fragile automation and precise but expensive human arbitration in current oracle systems, indicating that multi-agent setups can self-correct and surpass single-LLM oracles.

Key facts

  • Study evaluates multi-agent AI oracle systems for prediction market resolution
  • Compared independent aggregation and deliberative consensus against single-LLM baselines
  • Baselines include GPT-5 Nano, DeepSeek V3, and Llama-3.3-70B
  • Tested on 1,189 resolved prediction market questions from KalshiBench
  • All agents share common evidence layer through Exa
  • Retrieval filtered by publication date to isolate reasoning from retrieval quality
  • Independent aggregation with confidence-weighted voting achieves highest accuracy at 83.43 percent
  • Existing oracle systems trade off fast but brittle automation against accurate but costly human arbitration

Entities

Institutions

  • arXiv
  • KalshiBench
  • Exa

Sources