ARTFEED — Contemporary Art Intelligence

AgentTrust: Runtime Safety Layer for AI Agent Tool Use

ai-technology · 2026-05-07

A new runtime safety system called AgentTrust intercepts AI agent tool calls before execution to prevent unsafe actions like file deletion, credential exposure, or data exfiltration. It combines shell deobfuscation, SafeFix suggestions, RiskChain detection for multi-step attacks, and an LLM-as-Judge for ambiguous inputs. The system returns structured verdicts: allow, warn, block, or review. A benchmark of 300 scenarios across six risk categories is released. Existing defenses are incomplete, relying on post-hoc benchmarks, static guardrails, or infrastructure sandboxes that lack semantic understanding. AgentTrust addresses these gaps by evaluating actions in real-time.

Key facts

  • AgentTrust intercepts tool calls before execution
  • Returns verdicts: allow, warn, block, or review
  • Includes shell deobfuscation normalizer
  • SafeFix suggests safer alternatives
  • RiskChain detects multi-step attack chains
  • Cache-aware LLM-as-Judge for ambiguous inputs
  • 300-scenario benchmark across six risk categories
  • Addresses gaps in post-hoc benchmarks, static guardrails, and sandboxes

Entities

Institutions

  • arXiv

Sources