AgentTrap Benchmark Exposes Security Flaws in Third-Party LLM Agent Skills
AgentTrap serves as an innovative benchmark aimed at assessing the capability of LLM agents to utilize third-party skills while avoiding harmful runtime actions. These third-party skills, which encompass natural-language instructions, helper scripts, templates, documents, and service configurations, are increasingly forming the ecosystem for LLM agents. Yet, they pose a significant security risk: a malicious skill can mask detrimental actions within a standard workflow, exploiting the agent's high-level permissions and minimal human oversight. Comprising 141 tasks—91 of which are malicious and 50 benign—AgentTrap addresses 16 security-impact dimensions related to agent-skill supply-chain threats. Each task presents the agent with a typical user request, executing potentially harmful installed skills, thereby evaluating runtime trust failures to systematically gauge agent security in practical situations.
Key facts
- AgentTrap is a dynamic benchmark for LLM agent security.
- Third-party skills are the package ecosystem for LLM agents.
- Skills package instructions, scripts, templates, and configuration.
- Malicious skills can disguise harmful behavior as routine workflow.
- AgentTrap contains 141 tasks: 91 malicious and 50 benign.
- Tasks cover 16 security-impact dimensions.
- Dimensions are grounded in agent-skill supply-chain threats.
- Agents run with installed skills and receive ordinary user requests.
Entities
Institutions
- arXiv