ARTFEED — Contemporary Art Intelligence

Herculean: Benchmarking AI Agents for Financial Workflows

ai-technology · 2026-05-16

A new benchmark called Herculean evaluates AI agents on complex financial tasks. Unlike existing benchmarks that test static skills like question answering or summarization, Herculean assesses agents across four representative workflows: Trading, Hedging, Market Insights, and Auditing. Each workflow is implemented as a standardized MCP-based environment with specific tools, constraints, and success criteria. Tests on frontier agents show strong performance in Trading and Market Insights but significant struggles in Hedging and Auditing, particularly in long-horizon tasks.

Key facts

  • Herculean is the first skilled benchmark for agentic financial intelligence.
  • It covers four workflows: Trading, Hedging, Market Insights, and Auditing.
  • Each workflow uses a standardized MCP-based skill environment.
  • Frontier agents perform well on Trading and Market Insights.
  • Agents struggle substantially on Hedging and Auditing.
  • The benchmark enables consistent end-to-end assessment of heterogeneous agent systems.
  • Existing financial benchmarks evaluate static competencies only.
  • The paper is available on arXiv under ID 2605.14355.

Entities

Institutions

  • arXiv

Sources