ActuBench: Multi-Agent LLM Pipeline for Actuarial Reasoning

ai-technology · 2026-04-24

ActuBench is an innovative multi-agent LLM pipeline designed to streamline the creation and assessment of actuarial evaluation items in line with the International Actuarial Association (IAA) Education Syllabus. This system utilizes four specific roles for LLMs: one for drafting items, another for creating distractors, a third for verifying both processes and overseeing limited repair loops, and a cost-efficient auxiliary agent for summarizing Wikipedia notes and labeling topics. Users can access all items, model responses, and a complete leaderboard through the web interface at https://actubench.de/en/. The pipeline assessed 50 language models from eight different providers across two benchmarks: 100 challenging multiple-choice items and 100 open-ended items evaluated by an LLM judge. Key findings indicate that collaboration among agents enhances item quality, the repair loop effectively rectifies mistakes, and the cost-optimized agent lowers operational costs.

Key facts

ActuBench is a multi-agent LLM pipeline for actuarial assessment generation and evaluation.
Aligned with the International Actuarial Association (IAA) Education Syllabus.
Four LLM roles: item drafter, distractor constructor, verifier with repair loop, and auxiliary agent.
Web interface at https://actubench.de/en/ provides browsable items and leaderboard.
Evaluated 50 language models from eight providers.
Two benchmarks: 100 multiple-choice and 100 open-ended items.
Multi-agent collaboration improves item quality.
Repair loop effectively corrects errors.

ActuBench: Multi-Agent LLM Pipeline for Actuarial Reasoning

Key facts

Entities

Institutions

Sources