HCL-GP: Hierarchical Policy Learning Boosts LLM Agent Performance

ai-technology · 2026-05-11

A novel technique known as Hierarchical Component Learning for Generalized Policies (HCL-GP) merges hierarchical task decomposition with generalized planning for agents based on LLM. This method develops parameterized policies that extend across various task instances and systematically identifies reusable elements from successful runs, compiling them into a library for compositional policy creation. It tackles three main issues: automated decomposition for component learning, maximizing component reuse through generalization, and efficient retrieval using semantic search. When tested on the AppWorld benchmark, HCL-GP recorded an accuracy of 98.2% on standard tasks and 97.8% on challenging tasks involving unseen applications, marking a 15.8-point improvement over static synthesis in difficult scenarios. For open-source models, dynamic reuse leads to a 62.5% success rate compared to nearly zero without it. The research can be found on arXiv under ID 2605.06957.

Key facts

HCL-GP combines generalized planning and hierarchical task decomposition for LLM agents.
It learns parameterized policies that generalize across task instances.
Reusable components are extracted from successful executions and stored in a library.
Three challenges addressed: automated decomposition, component generalization, and semantic retrieval.
Evaluated on AppWorld benchmark: 98.2% accuracy on normal tasks, 97.8% on challenge tasks.
Improves 15.8 points over static synthesis on challenging scenarios.
Open-source models achieve 62.5% success with dynamic reuse vs near-zero without.
Paper published on arXiv with ID 2605.06957.

HCL-GP: Hierarchical Policy Learning Boosts LLM Agent Performance

Key facts

Entities

Institutions

Sources