Transformer Model Self-Improves for Optimal Plan Generation
A recent study reveals that decoder-only transformers can create high-quality solutions for previously unseen problems when trained with optimal datasets. Researchers tackle the complex issue of generating optimal plans in sub-exponential time. They demonstrate a method to enhance an initial model, initially trained on less-than-ideal data, by integrating multiple model calls with graph search techniques to refine plans for fine-tuning. Tests conducted in the Blocksworld, Logistics, Labyrinth, and Sokoban environments indicate a 30% decrease in plan length compared to the original symbolic planner, with over 80% of the plans being optimal when the best solution is known. Additionally, search during inference time further elevates plan quality.
Key facts
- Generative models trained on synthetic plan data are used for generalized planning.
- Recent work focused on any valid plan, not high-quality solutions.
- Decoder-only transformer can generate high-quality plans for unseen problems given optimal data.
- Self-improvement combines multiple model calls with graph search.
- Experiments on four domains: Blocksworld, Logistics, Labyrinth, Sokoban.
- Average 30% reduction in plan length over source symbolic planner.
- Over 80% of plans are optimal where optimum is known.
- Inference-time search further improves plan quality.
Entities
—