ARTFEED — Contemporary Art Intelligence

Behavioral Firewall for AI Agents Achieves 0% Attack Success in Multi-Step Workflows

ai-technology · 2026-04-30

A recent preprint on arXiv (2604.26274) presents Codename, a firewall designed for behavioral anomaly detection in structured-workflow AI agents, driven by telemetry data. This system constructs a parameterized deterministic finite automaton (pDFA) from confirmed benign tool-call telemetry, outlining acceptable tool sequences, contexts, and parameter limits. During operation, a lightweight gateway upholds these constraints through O(1) state-transition lookups, allowing intensive analysis to be conducted offline. Testing on the Agent Security Bench (ASB) reveals that Codename records a macro-averaged attack success rate (ASR) of 5.6% across five scenarios. In three structured workflows, the ASR decreases to 2.2%, surpassing Aegis, a leading stateless scanner, which has an ASR of 12.8%. Codename registers a 0% ASR for multi-step and context-sequential attacks.

Key facts

  • arXiv:2604.26274v1
  • Codename is a telemetry-driven behavioral anomaly detection firewall
  • Uses parameterized deterministic finite automaton (pDFA)
  • Runtime enforcement via O(1) state-transition structural lookup
  • Evaluated on Agent Security Bench (ASB)
  • 5.6% macro-averaged attack success rate across five scenarios
  • 2.2% ASR in three structured workflows
  • Outperforms Aegis (12.8% ASR)
  • 0% ASR on multi-step and context-sequential attacks

Entities

Institutions

  • arXiv
  • Agent Security Bench

Sources