ExploitBench: Capability Ladder Benchmark for LLM Cybersecurity Agents
ExploitBench is a groundbreaking tool that assesses exploitation through 16 measurable factors, such as coverage, crashes, sandbox capabilities, arbitrary read/write, control-flow hijacking, and executing arbitrary code. Each feature is verified by a reliable method that employs random challenges for testing primitives and compares outputs against known binaries, plus a proof for code execution via signal-handling. This benchmark targets 41 vulnerabilities in V8 due to its popularity. Unlike existing LLM security benchmarks, which view a crash as a successful exploit, ExploitBench sees exploitation more as a series of steps, evolving from just executing a faulty line of code to gaining full control over the target system.
Key facts
- ExploitBench decomposes exploitation into 16 measurable flags.
- Flags include coverage, crash, sandbox primitives, arbitrary read/write, control-flow hijack, and arbitrary code execution.
- Each capability is verified by a deterministic oracle.
- The oracle uses per-run randomized challenge-response for primitives.
- Differential execution against ground-truth binaries measures progress.
- A signal-handler proof is used for code execution.
- ExploitBench is instantiated on 41 V8 bugs.
- Existing LLM security benchmarks treat a crash as exploitation success.
Entities
—