Three-Layer Safety Architecture Required for LLM Agents

ai-technology · 2026-05-20

A new position paper argues that ensuring safety for deployed LLM agents requires a three-layer probabilistic assume-guarantee architecture, not a single guardrail. The paper, posted on arXiv (2605.18672v1), contends that safety comprises three distinct dimensions—semantic intent and policy compliance, environmental validity, and dynamical feasibility—each depending on information available at different execution stages. No single abstraction layer can certify all three. The authors propose a contract-based architecture where each dimension is enforced by an independently certified layer, with probabilistic guarantees satisfying the next layer's assumptions. They derive compositional system-level safety bounds via the chain rule of probability. The paper asserts this structural requirement is a consequence of how agent execution works, not a contingent limitation.

Key facts

Paper argues single abstraction layer is categorically insufficient for LLM agent safety.
Three safety dimensions: semantic intent/policy compliance, environmental validity, dynamical feasibility.
Each dimension depends on information from different execution stages.
Proposes contract-based architecture with independently certified layers.
Safety bounds derived via chain rule of probability.
Posted on arXiv with ID 2605.18672v1.
Claims structural necessity, not contingent limitation.
Focus on deployed LLM agents.

Three-Layer Safety Architecture Required for LLM Agents

Key facts

Entities

Institutions

Sources