AgentDoG: Diagnostic Guardrail Framework for AI Agent Safety

ai-technology · 2026-04-25

A novel framework named AgentDoG (Diagnostic Guardrail for agent safety and security) has been launched to tackle the safety and security issues associated with AI agents, especially those stemming from their autonomous use of tools and interactions with their surroundings. This framework is outlined in a paper available on arXiv (2601.18491) and introduces a comprehensive three-dimensional taxonomy that classifies agentic risks based on their source (where), mode of failure (how), and consequences (what). This classification supports a new detailed safety benchmark for agents (ATBench) and the AgentDoG framework itself. AgentDoG enables nuanced, contextual oversight of agent behaviors, diagnosing both unsafe actions and those that may appear safe yet are unreasonable, enhancing transparency beyond simple binary classifications. The initiative seeks to elevate risk awareness and transparency in existing guardrail models, which currently lack diagnostic capabilities and agentic risk awareness.

Key facts

AgentDoG is a diagnostic guardrail framework for AI agent safety and security.
It addresses challenges from autonomous tool use and environmental interactions.
A unified three-dimensional taxonomy categorizes risks by source, failure mode, and consequence.
The taxonomy is used to create ATBench, a fine-grained agentic safety benchmark.
AgentDoG provides contextual monitoring across agent trajectories.
It can diagnose root causes of unsafe and unreasonable actions.
The framework offers provenance and transparency beyond binary labels.
Current guardrail models lack agentic risk awareness and transparency.

AgentDoG: Diagnostic Guardrail Framework for AI Agent Safety

Key facts

Entities

Institutions

Sources