AgentTrust: Runtime Safety Layer for AI Agent Tool Use

ai-technology · 2026-05-07

A new runtime safety system called AgentTrust intercepts AI agent tool calls before execution to prevent unsafe actions like file deletion, credential exposure, or data exfiltration. It combines shell deobfuscation, SafeFix suggestions, RiskChain detection for multi-step attacks, and an LLM-as-Judge for ambiguous inputs. The system returns structured verdicts: allow, warn, block, or review. A benchmark of 300 scenarios across six risk categories is released. Existing defenses are incomplete, relying on post-hoc benchmarks, static guardrails, or infrastructure sandboxes that lack semantic understanding. AgentTrust addresses these gaps by evaluating actions in real-time.

Key facts

AgentTrust intercepts tool calls before execution
Returns verdicts: allow, warn, block, or review
Includes shell deobfuscation normalizer
SafeFix suggests safer alternatives
RiskChain detects multi-step attack chains
Cache-aware LLM-as-Judge for ambiguous inputs
300-scenario benchmark across six risk categories
Addresses gaps in post-hoc benchmarks, static guardrails, and sandboxes

AgentTrust: Runtime Safety Layer for AI Agent Tool Use

Key facts

Entities

Institutions

Sources