FORGE: LLM Agent Memory Evolution via Population Broadcast

ai-technology · 2026-05-18

A new protocol called FORGE (Failure-Optimized Reflective Graduation and Evolution) enables LLM agents to improve decision-making through self-generated memory without weight updates. It uses a population-based approach where prompt-injected natural-language memory evolves across stages. An inner Reflexion loop converts failed trajectories into reusable artifacts (rules, examples, or mixed), while an outer loop propagates the best memory across the population and freezes converged instances. Tested on CybORG CAGE-2, a stochastic network-defense POMDP, with four LLM families (Gemini-2.5-Flash-Lite, Grok-4-Fast, Llama-4-Maverick, and others), FORGE demonstrates performance gains without gradient updates.

Key facts

FORGE stands for Failure-Optimized Reflective Graduation and Evolution
No weight updates are used; memory evolves via prompt injection
Inner loop uses Reflexion-style reflection on failed trajectories
Memory artifacts include Rules, Examples, or Mixed
Outer loop propagates best-performing memory across population
Graduation criterion freezes converged instances
Evaluated on CybORG CAGE-2 at 30-step horizon against B-line attacker
Tested with Gemini-2.5-Flash-Lite, Grok-4-Fast, Llama-4-Maverick

FORGE: LLM Agent Memory Evolution via Population Broadcast

Key facts

Entities

Institutions

Sources