ARTFEED — Contemporary Art Intelligence

Systematic Analysis of AI Agent Safety Benchmarks Reveals Inconsistencies

ai-technology · 2026-05-20

A new study from arXiv presents the first systematic analysis of safety benchmarks for LLM-based autonomous agents, identifying significant inconsistencies in threat models, metrics, and risk coverage. The research catalogs 40 behavioral agent-safety benchmarks from 2023 to 2026, plus 5 adjacent artifacts, and proposes a six-axis taxonomy for evaluating benchmark methodology. A coverage matrix shows broad risk coverage but limited methodological convergence, with most benchmarks concentrated in sandboxed, constrained, and safety-only environments. The study highlights the need for standardized evaluation frameworks as agent deployment accelerates.

Key facts

  • First systematic analysis dedicated to agent safety benchmarks as evaluation instruments.
  • Cataloged 40 behavioral agent-safety benchmarks from 2023 to 2026.
  • Also includes 5 adjacent evaluator, defense, and dataset artifacts.
  • Proposes a six-axis taxonomy of benchmark evaluation methodology.
  • Coverage matrix reveals broad risk coverage but limited methodological convergence.
  • Behavioral-benchmark core concentrated in sandboxed, constrained, and often safety-only environments.
  • Benchmarks developed independently with inconsistent threat models and incompatible metrics.
  • Study addresses safety risks extending beyond traditional LLM concerns.

Entities

Institutions

  • arXiv

Sources