PIVOT: Self-Supervised Trajectory Refinement for LLM Agents
The self-supervised framework known as PIVOT (Plan-Inspect-eVOlve Trajectories) optimizes agent trajectories through iterative refinement via environmental interaction. It tackles the issue of plan-execution misalignment in LLM-based agents, which frequently create coherent plans that falter due to impractical actions, violations of constraints, and cumulative errors. The framework consists of four phases: PLAN generates potential trajectories; INSPECT executes them while calculating structured losses using textual gradients; EVOLVE enhances trajectories based on these signals; and VERIFY conducts a comprehensive final assessment. A monotonic acceptance process guarantees that the quality of solutions does not decrease. Empirical tests on DeepPlanning and GAIA demonstrate leading performance, particularly with human-in-the-loop feedback. The paper can be found on arXiv with ID 2605.11225.
Key facts
- PIVOT stands for Plan-Inspect-eVOlve Trajectories
- It is a self-supervised framework for LLM agents
- Addresses plan-execution misalignment
- Four stages: PLAN, INSPECT, EVOLVE, VERIFY
- Uses structured losses with textual gradients
- Monotonic acceptance process ensures non-decreasing quality
- Evaluated on DeepPlanning and GAIA benchmarks
- Achieves state-of-the-art performance with HITL feedback
- Paper available on arXiv: 2605.11225
Entities
Institutions
- arXiv