SkillGuard-Robust: New Framework for Securing Untrusted AI Agent Skills
A recent study presents SkillGuard-Robust, a framework designed for the auditing of untrusted Agent Skills. These Agent Skills encapsulate SKILL.md files, scripts, and reference materials into reusable components, which necessitates a cross-file security assessment prior to loading. Current guardrails may identify risks but often fail to consistently interpret malicious intent amid semantics-preserving rewrites. SkillGuard-Robust redefines pre-load auditing as a robust three-way classification challenge, integrating role-aware evidence extraction, selective semantic verification, and consistency-preserving adjudication. Tested on SkillGuardBench and two public-ecosystem extensions across five evaluation perspectives (ranging from 254 to 404 packages), the framework recorded an overall exact match of 97.30%, a malicious-risk recall of 98.33%, and an attack exact consistency of 98.89% for the 404-package held-out aggregate. The paper can be found on arXiv with ID 2604.25109.
Key facts
- SkillGuard-Robust addresses security auditing of untrusted Agent Skills.
- Agent Skills package SKILL.md files, scripts, reference documents, and repository context.
- Existing guardrails inconsistently recover malicious intent under semantics-preserving rewrites.
- SkillGuard-Robust uses role-aware evidence extraction, selective semantic verification, and consistency-preserving adjudication.
- Evaluated on SkillGuardBench and two public-ecosystem extensions.
- Five evaluation views ranged from 254 to 404 packages.
- On the 404-package held-out aggregate: 97.30% exact match, 98.33% malicious-risk recall, 98.89% attack exact consistency.
- Paper ID: arXiv:2604.25109.
Entities
Institutions
- arXiv