Hierarchical Task Decomposition Boosts LLM Spatial Reasoning

ai-technology · 2026-05-28

A recent preprint on arXiv (2605.28144) presents a hierarchical decomposition technique aimed at enhancing spatial reasoning in large language models (LLMs). Drawing inspiration from hierarchical reinforcement learning, this method enables LLMs to deconstruct intricate spatial tasks into simpler sub-tasks by pinpointing essential intermediate states and creating streamlined sub-environments. Nonetheless, LLMs frequently struggle to identify optimal intermediate states due to a lack of adequate spatial priors, resulting in less effective decomposition. To tackle this issue, the authors propose MCTS-Guided Group Relative Policy Optimization (M-GRPO), which modifies the UCT formula to include the LLM's prior predictive probabilities. This research seeks to bolster LLMs' planning abilities for applications in embodied intelligence.

Key facts

arXiv preprint 2605.28144 proposes hierarchical decomposition for LLM spatial reasoning
Method inspired by hierarchical reinforcement learning
LLMs decompose complex tasks into sub-tasks via intermediate states and sub-environments
LLMs often fail to derive optimal intermediate states due to insufficient spatial priors
M-GRPO (MCTS-Guided Group Relative Policy Optimization) introduced to address limitation
M-GRPO reformulates UCT formula using LLM's prior predictive probabilities
Goal is to enhance LLM planning for embodied intelligence
Paper published on arXiv under announcement type 'new'

Hierarchical Task Decomposition Boosts LLM Spatial Reasoning

Key facts

Entities

Institutions

Sources