ExploitBench: Capability Ladder Benchmark for LLM Cybersecurity Agents

other · 2026-05-16

ExploitBench is a groundbreaking tool that assesses exploitation through 16 measurable factors, such as coverage, crashes, sandbox capabilities, arbitrary read/write, control-flow hijacking, and executing arbitrary code. Each feature is verified by a reliable method that employs random challenges for testing primitives and compares outputs against known binaries, plus a proof for code execution via signal-handling. This benchmark targets 41 vulnerabilities in V8 due to its popularity. Unlike existing LLM security benchmarks, which view a crash as a successful exploit, ExploitBench sees exploitation more as a series of steps, evolving from just executing a faulty line of code to gaining full control over the target system.

Key facts

ExploitBench decomposes exploitation into 16 measurable flags.
Flags include coverage, crash, sandbox primitives, arbitrary read/write, control-flow hijack, and arbitrary code execution.
Each capability is verified by a deterministic oracle.
The oracle uses per-run randomized challenge-response for primitives.
Differential execution against ground-truth binaries measures progress.
A signal-handler proof is used for code execution.
ExploitBench is instantiated on 41 V8 bugs.
Existing LLM security benchmarks treat a crash as exploitation success.

Entities

—

Sources

arXiv cs.AI — 2026-05-16