Transformer Model Self-Improves for Optimal Plan Generation

other · 2026-05-07

A recent study reveals that decoder-only transformers can create high-quality solutions for previously unseen problems when trained with optimal datasets. Researchers tackle the complex issue of generating optimal plans in sub-exponential time. They demonstrate a method to enhance an initial model, initially trained on less-than-ideal data, by integrating multiple model calls with graph search techniques to refine plans for fine-tuning. Tests conducted in the Blocksworld, Logistics, Labyrinth, and Sokoban environments indicate a 30% decrease in plan length compared to the original symbolic planner, with over 80% of the plans being optimal when the best solution is known. Additionally, search during inference time further elevates plan quality.

Key facts

Generative models trained on synthetic plan data are used for generalized planning.
Recent work focused on any valid plan, not high-quality solutions.
Decoder-only transformer can generate high-quality plans for unseen problems given optimal data.
Self-improvement combines multiple model calls with graph search.
Experiments on four domains: Blocksworld, Logistics, Labyrinth, Sokoban.
Average 30% reduction in plan length over source symbolic planner.
Over 80% of plans are optimal where optimum is known.
Inference-time search further improves plan quality.

Entities

—

Sources

arXiv cs.AI — 2026-05-06