VeriGraph Framework Enhances Robot Task Planning with Scene Graphs and Vision-Language Models

ai-technology · 2026-04-20

VeriGraph, a novel framework, combines vision-language models for robotic planning while ensuring the feasibility of actions through scene graphs. By generating scene graphs from images, it effectively identifies essential objects and their spatial relationships, which facilitates accurate plan verification and adjustments. The system utilizes these scene graphs to iteratively assess and modify action sequences generated by a task planner based on a large language model (LLM), guaranteeing that constraints are met and actions are viable. This method markedly enhances task completion rates in various manipulation tasks, achieving a 58% increase in language-based tasks, 56% in tangram puzzles, and 30% in image-based tasks compared to traditional approaches. VeriGraph addresses the shortcomings of existing vision-language models that frequently yield erroneous action sequences. Recent advancements in these models have created new opportunities for robot task planning, with scene graphs serving as a crucial intermediate representation for improved plan verification.

Key facts

VeriGraph integrates vision-language models for robotic planning with action feasibility verification
The framework uses scene graphs as an intermediate representation to capture objects and spatial relationships
Scene graphs are generated from input images to enable plan verification and refinement
The system iteratively checks and corrects action sequences from an LLM-based task planner
VeriGraph ensures constraints are respected and actions are executable
Task completion rates improve by 58% on language-based tasks compared to baselines
Performance gains include 56% improvement on tangram puzzle tasks
Image-based tasks show 30% improvement over baseline methods

Entities

—

Sources

arXiv cs.AI — 2026-04-20