PlanningBench: Scalable Planning Data Generation for LLMs

ai-technology · 2026-05-22

PlanningBench is a new framework for generating scalable, diverse, and verifiable planning data to evaluate and train large language models. It addresses limitations of existing benchmarks that treat planning data as fixed collections, limiting scenario coverage and tying difficulty to surface-level proxies. The framework abstracts real planning scenarios into a structured taxonomy of over 30 task types, subtasks, constraint families, and difficulty factors. This enables controllable generation, automatic verification, and planning-oriented training. The approach supports broader scenario coverage and structural difficulty sources, improving LLM planning capabilities.

Key facts

PlanningBench generates scalable, diverse, and verifiable planning data.
Existing planning benchmarks treat data as fixed collections.
The framework uses a taxonomy of over 30 task types.
It abstracts real planning scenarios into structured categories.
It supports automatic verification and planning-oriented training.
It addresses limitations in scenario coverage and difficulty proxies.
The goal is to improve LLM planning capabilities.
Published on arXiv with ID 2605.20873.

PlanningBench: Scalable Planning Data Generation for LLMs

Key facts

Entities

Institutions

Sources