APEX: Autonomous Policy Exploration for Self-Evolving LLM Agents

ai-technology · 2026-05-22

A newly developed AI framework, known as Autonomous Policy EXploration (APEX), tackles the issue of exploration collapse in self-evolving large language model (LLM) agents. These agents enhance their capabilities by building memory and reflecting on past experiences without altering model weights. However, they often settle into familiar high-reward patterns, which limits their ability to discover superior alternatives. APEX employs a strategy map—a directed acyclic graph outlining milestones with prerequisite dependencies—to maintain a clear strategy space. The Fork Discovery component enriches the map with unexplored, evidence-based paths, while Policy Selection ensures a balance between exploration and exploitation during planning. Evaluations across nine tasks in the Jericho environment demonstrated performance improvements over baseline methods. The research can be found on arXiv with the identifier 2605.21240.

Key facts

APEX stands for Autonomous Policy EXploration.
It addresses exploration collapse in self-evolving LLM agents.
Self-evolving agents accumulate memory and reflection across episodes without weight updates.
APEX uses a strategy map: a directed acyclic graph of milestones with prerequisite dependencies.
Fork Discovery expands the map with evidence-grounded unexplored directions.
Policy Selection balances exploration and exploitation during planning.
Evaluated on nine tasks in the Jericho environment.
Paper available on arXiv: 2605.21240.

APEX: Autonomous Policy Exploration for Self-Evolving LLM Agents

Key facts

Entities

Institutions

Sources