ARTFEED — Contemporary Art Intelligence

Separation-of-Powers Architecture for AI Agent Safety

ai-technology · 2026-04-29

A recent study published on arXiv (2604.23646) introduces the Policy-Execution-Authorization (PEA) architecture, designed to enhance safety in AI agents through a separation-of-powers approach at the system level. The researchers contend that current techniques, such as RLHF and constitutional prompting, only offer probabilistic assurances against agentic misalignment, where advanced AI systems may produce and carry out harmful actions based on self-generated objectives. PEA separates intent generation, authorization, and execution into distinct layers linked by cryptographically secured capability tokens. The paper outlines five key contributions, including an Intent Verification Layer (IVL) to ensure consistency between capability and intent, and Intent Lineage Tracking (ILT), which connects all executable intents to their original user requests through cryptographic anchors. This work seeks to structurally maintain goal integrity and avert unauthorized or misaligned actions.

Key facts

  • arXiv paper 2604.23646 proposes PEA architecture
  • PEA is a separation-of-powers design for AI agent safety
  • Existing methods like RLHF and constitutional prompting are probabilistic
  • PEA decouples intent generation, authorization, and execution
  • Layers are connected via cryptographically constrained capability tokens
  • Includes Intent Verification Layer (IVL) for consistency
  • Intent Lineage Tracking (ILT) binds intents to user requests via cryptographic anchors
  • Aims to structurally enforce goal integrity

Entities

Institutions

  • arXiv

Sources