MAGE Framework Protects LLM Agents from Long-Horizon Threats

ai-technology · 2026-05-07

A new defensive framework called MAGE (Memory As Guardrail Enforcement) has been developed by researchers to protect large language model (LLM)-powered agents from long-term threats. These threats take advantage of prolonged interactions between users, agents, and environments to achieve malicious goals that are unlikely in single-turn situations, thereby endangering critical deployments. Drawing inspiration from the 'shadow stack' concept in systems security, MAGE features a specialized agentic memory that captures and preserves essential safety context throughout the agent's entire execution path. This shadow memory evaluates the risks of upcoming actions before they are carried out. Comprehensive testing indicates that MAGE significantly surpasses current defenses in various attack scenarios, addressing an emerging range of threats as LLM agents are increasingly utilized for intricate, real-world applications.

Key facts

MAGE stands for Memory As Guardrail Enforcement
It is a defensive framework for LLM-powered agents
Targets long-horizon threats exploiting extended interactions
Inspired by shadow stack abstraction in systems security
Maintains a dedicated safety-focused agentic memory
Proactively assesses risk of pending actions before execution
Outperforms existing defenses in evaluations
Addresses risks in critical domain deployments

Entities

—

Sources

arXiv cs.AI — 2026-05-06