Behavioral Firewall for AI Agents Achieves 0% Attack Success in Multi-Step Workflows
A recent preprint on arXiv (2604.26274) presents Codename, a firewall designed for behavioral anomaly detection in structured-workflow AI agents, driven by telemetry data. This system constructs a parameterized deterministic finite automaton (pDFA) from confirmed benign tool-call telemetry, outlining acceptable tool sequences, contexts, and parameter limits. During operation, a lightweight gateway upholds these constraints through O(1) state-transition lookups, allowing intensive analysis to be conducted offline. Testing on the Agent Security Bench (ASB) reveals that Codename records a macro-averaged attack success rate (ASR) of 5.6% across five scenarios. In three structured workflows, the ASR decreases to 2.2%, surpassing Aegis, a leading stateless scanner, which has an ASR of 12.8%. Codename registers a 0% ASR for multi-step and context-sequential attacks.
Key facts
- arXiv:2604.26274v1
- Codename is a telemetry-driven behavioral anomaly detection firewall
- Uses parameterized deterministic finite automaton (pDFA)
- Runtime enforcement via O(1) state-transition structural lookup
- Evaluated on Agent Security Bench (ASB)
- 5.6% macro-averaged attack success rate across five scenarios
- 2.2% ASR in three structured workflows
- Outperforms Aegis (12.8% ASR)
- 0% ASR on multi-step and context-sequential attacks
Entities
Institutions
- arXiv
- Agent Security Bench