MAGE Framework Protects LLM Agents from Long-Horizon Threats
A new defensive framework called MAGE (Memory As Guardrail Enforcement) has been developed by researchers to protect large language model (LLM)-powered agents from long-term threats. These threats take advantage of prolonged interactions between users, agents, and environments to achieve malicious goals that are unlikely in single-turn situations, thereby endangering critical deployments. Drawing inspiration from the 'shadow stack' concept in systems security, MAGE features a specialized agentic memory that captures and preserves essential safety context throughout the agent's entire execution path. This shadow memory evaluates the risks of upcoming actions before they are carried out. Comprehensive testing indicates that MAGE significantly surpasses current defenses in various attack scenarios, addressing an emerging range of threats as LLM agents are increasingly utilized for intricate, real-world applications.
Key facts
- MAGE stands for Memory As Guardrail Enforcement
- It is a defensive framework for LLM-powered agents
- Targets long-horizon threats exploiting extended interactions
- Inspired by shadow stack abstraction in systems security
- Maintains a dedicated safety-focused agentic memory
- Proactively assesses risk of pending actions before execution
- Outperforms existing defenses in evaluations
- Addresses risks in critical domain deployments
Entities
—