PIVOT: Self-Supervised Trajectory Refinement for LLM Agents

ai-technology · 2026-05-13

The self-supervised framework known as PIVOT (Plan-Inspect-eVOlve Trajectories) optimizes agent trajectories through iterative refinement via environmental interaction. It tackles the issue of plan-execution misalignment in LLM-based agents, which frequently create coherent plans that falter due to impractical actions, violations of constraints, and cumulative errors. The framework consists of four phases: PLAN generates potential trajectories; INSPECT executes them while calculating structured losses using textual gradients; EVOLVE enhances trajectories based on these signals; and VERIFY conducts a comprehensive final assessment. A monotonic acceptance process guarantees that the quality of solutions does not decrease. Empirical tests on DeepPlanning and GAIA demonstrate leading performance, particularly with human-in-the-loop feedback. The paper can be found on arXiv with ID 2605.11225.

Key facts

PIVOT stands for Plan-Inspect-eVOlve Trajectories
It is a self-supervised framework for LLM agents
Addresses plan-execution misalignment
Four stages: PLAN, INSPECT, EVOLVE, VERIFY
Uses structured losses with textual gradients
Monotonic acceptance process ensures non-decreasing quality
Evaluated on DeepPlanning and GAIA benchmarks
Achieves state-of-the-art performance with HITL feedback
Paper available on arXiv: 2605.11225

PIVOT: Self-Supervised Trajectory Refinement for LLM Agents

Key facts

Entities

Institutions

Sources