Behavioral Firewall for AI Agents Achieves 0% Attack Success in Multi-Step Workflows

ai-technology · 2026-04-30

A recent preprint on arXiv (2604.26274) presents Codename, a firewall designed for behavioral anomaly detection in structured-workflow AI agents, driven by telemetry data. This system constructs a parameterized deterministic finite automaton (pDFA) from confirmed benign tool-call telemetry, outlining acceptable tool sequences, contexts, and parameter limits. During operation, a lightweight gateway upholds these constraints through O(1) state-transition lookups, allowing intensive analysis to be conducted offline. Testing on the Agent Security Bench (ASB) reveals that Codename records a macro-averaged attack success rate (ASR) of 5.6% across five scenarios. In three structured workflows, the ASR decreases to 2.2%, surpassing Aegis, a leading stateless scanner, which has an ASR of 12.8%. Codename registers a 0% ASR for multi-step and context-sequential attacks.

Key facts

arXiv:2604.26274v1
Codename is a telemetry-driven behavioral anomaly detection firewall
Uses parameterized deterministic finite automaton (pDFA)
Runtime enforcement via O(1) state-transition structural lookup
Evaluated on Agent Security Bench (ASB)
5.6% macro-averaged attack success rate across five scenarios
2.2% ASR in three structured workflows
Outperforms Aegis (12.8% ASR)
0% ASR on multi-step and context-sequential attacks

Behavioral Firewall for AI Agents Achieves 0% Attack Success in Multi-Step Workflows

Key facts

Entities

Institutions

Sources