ARTFEED — Contemporary Art Intelligence

ATBench: New Benchmark for LLM Agent Safety Evaluation

ai-technology · 2026-05-14

A new benchmark called ATBench has been developed by researchers to evaluate the safety of LLM-based agents in a structured, diverse, and realistic manner. This benchmark overcomes the shortcomings of current assessments by categorizing agentic risk into three areas: source of risk, mode of failure, and potential real-world harm. It features 1,000 trajectories, comprising 503 that are safe and 497 that are unsafe, with an average of 9.01 turns and 3.95k tokens. Additionally, it utilizes 1,954 tools from a total of 2,084 available. ATBench incorporates a long-context delayed-trigger protocol to effectively capture the emergence of realistic risks across various stages.

Key facts

  • ATBench is a trajectory-level benchmark for LLM agent safety evaluation.
  • It organizes risk along three dimensions: risk source, failure mode, and real-world harm.
  • The benchmark contains 1,000 trajectories (503 safe, 497 unsafe).
  • Trajectories average 9.01 turns and 3.95k tokens.
  • There are 1,954 invoked tools from pools of 2,084 available tools.
  • It uses a long-context delayed-trigger protocol for realistic risk emergence.
  • The benchmark aims to improve diversity, observability, and realism in safety evaluation.
  • The work is published on arXiv under ID 2604.02022.

Entities

Institutions

  • arXiv

Sources