ExploitGym Benchmark Tests AI Agents' Ability to Turn Vulnerabilities into Attacks

ai-technology · 2026-05-13

ExploitGym, a newly established benchmark, assesses the ability of AI agents to convert software vulnerabilities into real-world attacks, such as unauthorized access to files or executing code. This task demands low-level reasoning about programs, adaptability during runtime, and the ability to maintain progress over extended periods. Exploitation serves dual purposes, aiding defensive measures while also simplifying offensive tactics. Despite its significance, the area of exploitation is often overlooked in evaluations. ExploitGym aims to fill this void by offering a comprehensive, varied, and realistic benchmark. Agents are tasked with taking a program input that exposes a vulnerability and gradually transforming it into a functional exploit. The findings are available on arXiv.

Key facts

ExploitGym is a benchmark for AI agents' exploitation capabilities.
Exploitation turns a vulnerability into a concrete security impact.
The task requires low-level program reasoning and runtime adaptation.
Exploitation is dual-use: defensive and offensive.
The benchmark is large-scale, diverse, and realistic.
Agents are tasked with extending a vulnerability trigger into a working exploit.
The research is published on arXiv.
Exploitation remains under-evaluated despite its importance.

ExploitGym Benchmark Tests AI Agents' Ability to Turn Vulnerabilities into Attacks

Key facts

Entities

Institutions

Sources