FinChain: Verifiable Chain-of-Thought Benchmark for Financial Reasoning

ai-technology · 2026-05-01

FinChain has been launched by researchers as the inaugural benchmark tailored for verifiable Chain-of-Thought assessment in the financial sector. Covering 58 subjects within 12 financial areas, it employs parameterized symbolic templates alongside executable Python code to facilitate scalable and contamination-free data creation. The CHAINEVAL metric introduced assesses both the accuracy of final answers and the consistency of reasoning at each step. An evaluation of 26 top-performing LLMs indicates that even the most advanced models display significant shortcomings in multi-step symbolic reasoning.

Key facts

FinChain is the first benchmark for verifiable Chain-of-Thought evaluation in finance.
It covers 58 topics across 12 financial domains.
Uses parameterized symbolic templates with executable Python code.
Enables fully machine-verifiable reasoning and contamination-free data generation.
CHAINEVAL is a dynamic alignment measure for final-answer and step-level reasoning.
26 leading LLMs were evaluated.
Frontier LLMs show clear limitations in multi-step symbolic reasoning.
Existing datasets like FinQA and ConvFinQA neglect intermediate reasoning steps.

Entities

—

Sources

arXiv cs.AI — 2026-05-01