2.5-D Decomposition Boosts LLM Spatial Construction Accuracy to 94.6%
A novel neuro-symbolic approach known as 2.5-D decomposition enhances the performance of large language models (LLMs) in spatial construction tasks. This technique, outlined in arXiv:2605.07066, divides planning into a two-dimensional horizontal plane managed by the LLM, while a deterministic executor calculates vertical placements based on column occupancy. This method addresses the common coordinate errors found in 3D block placements derived from natural-language directives. In the Build What I Mean benchmark (160 rounds), GPT-4o-mini utilizing this pipeline achieved a mean structural accuracy of 94.6% over 12 independent trials, just 3.0 percentage points shy of the 97.6% limit imposed by architect-agent errors. This result surpasses GPT-4o's 90.3% and the leading competitor's 76.3%. A controlled ablation study revealed that 2.5-D decomposition contributed 50.7% to the improvement, facilitating autonomous systems to construct structures from natural-language instructions with dependable spatial reasoning.
Key facts
- arXiv:2605.07066 introduces 2.5-D decomposition for LLM-based spatial construction.
- The pipeline separates planning into 2D horizontal plane (LLM) and deterministic vertical executor.
- GPT-4o-mini with pipeline achieves 94.6% mean structural accuracy on Build What I Mean benchmark.
- Accuracy is within 3.0 percentage points of the 97.6% ceiling from architect-agent errors.
- Outperforms GPT-4o (90.3%) and best competing system (76.3%).
- Ablation shows 2.5-D decomposition accounts for 50.7% of improvement.
- Method eliminates systematic coordinate errors in 3D block placement.
- Enables autonomous systems to follow natural-language building instructions.
Entities
Institutions
- arXiv