Proteus Framework Measures Adaptive Leakage in LLM Agent Skills

other · 2026-05-13

A recent study has unveiled Proteus, a framework for red-team operations that evolves autonomously, aimed at evaluating adaptive leakage within LLM agent capabilities. These agent skills enhance LLMs by incorporating reusable instructions, tool interfaces, and executable code, with users frequently adopting third-party skills from various marketplaces and repositories. The authors contend that assessing deployment risks cannot rely solely on one-time audits or prompt-level red teams, as attackers can modify skills iteratively based on feedback received during audits and runtime. Proteus establishes a five-dimensional skill-attack framework and employs a comprehensive audit-sandbox-oracle pipeline for candidate evaluation, yielding structured insights to inform skill mutations. This framework mitigates the risk posed by budget-conscious attackers who can adjust a skill until it successfully passes audits and inflicts verified runtime damage. The research is accessible on arXiv with the identifier 2605.11891.

Key facts

Proteus is a grey-box self-evolving red-team framework.
It measures adaptive leakage in LLM agent skills.
Agent skills include instructions, tool interfaces, and executable code.
Users install third-party skills from marketplaces and repositories.
Single-shot audits and prompt-level red teams are insufficient.
Attackers can iteratively rewrite skills using feedback.
Proteus searches a five-axis skill-attack space.
The framework uses an audit-sandbox-oracle pipeline.
The paper is on arXiv with ID 2605.11891.

Proteus Framework Measures Adaptive Leakage in LLM Agent Skills

Key facts

Entities

Institutions

Sources