SkillSafetyBench: New Benchmark Exposes Agent Safety Risks from Reusable Skills

ai-technology · 2026-05-13

A new tool named SkillSafetyBench has been developed to assess safety concerns linked to large language model agents that utilize reusable skills. These skills integrate procedural instructions with the ability to manipulate files, tools, memory, and various execution contexts, which can introduce vulnerabilities overlooked by existing safety checks. The benchmarking includes 155 adversarial scenarios across 47 distinct tasks, spanning six risk categories and 30 safety classifications, each backed by its own rule-based verifier. Testing with different CLI agents revealed that non-user attacks can trigger unsafe actions, highlighting unique failure modes that point to the necessity for enhanced safeguards against skill-related risks.

Key facts

SkillSafetyBench is a new benchmark for evaluating agent safety under skill-facing attack surfaces.
Reusable skills are a common interface for extending LLM agents, but introduce attack surfaces.
The benchmark includes 155 adversarial cases across 47 tasks, 6 risk domains, and 30 safety categories.
Each case is evaluated with a case-specific rule-based verifier.
Experiments used multiple CLI agents and model backends.
Localized non-user attacks can consistently induce unsafe behavior.
Failure patterns vary across domains, attack methods, and scaffold-model pairings.
The research is published on arXiv with ID 2605.12015.

SkillSafetyBench: New Benchmark Exposes Agent Safety Risks from Reusable Skills

Key facts

Entities

Institutions

Sources