AgentTrap Benchmark Exposes Security Flaws in Third-Party LLM Agent Skills

ai-technology · 2026-05-16

AgentTrap serves as an innovative benchmark aimed at assessing the capability of LLM agents to utilize third-party skills while avoiding harmful runtime actions. These third-party skills, which encompass natural-language instructions, helper scripts, templates, documents, and service configurations, are increasingly forming the ecosystem for LLM agents. Yet, they pose a significant security risk: a malicious skill can mask detrimental actions within a standard workflow, exploiting the agent's high-level permissions and minimal human oversight. Comprising 141 tasks—91 of which are malicious and 50 benign—AgentTrap addresses 16 security-impact dimensions related to agent-skill supply-chain threats. Each task presents the agent with a typical user request, executing potentially harmful installed skills, thereby evaluating runtime trust failures to systematically gauge agent security in practical situations.

Key facts

AgentTrap is a dynamic benchmark for LLM agent security.
Third-party skills are the package ecosystem for LLM agents.
Skills package instructions, scripts, templates, and configuration.
Malicious skills can disguise harmful behavior as routine workflow.
AgentTrap contains 141 tasks: 91 malicious and 50 benign.
Tasks cover 16 security-impact dimensions.
Dimensions are grounded in agent-skill supply-chain threats.
Agents run with installed skills and receive ordinary user requests.

AgentTrap Benchmark Exposes Security Flaws in Third-Party LLM Agent Skills

Key facts

Entities

Institutions

Sources