ARTFEED — Contemporary Art Intelligence

VeriGraph Framework Enhances Robot Task Planning with Scene Graphs and Vision-Language Models

ai-technology · 2026-04-20

VeriGraph, a novel framework, combines vision-language models for robotic planning while ensuring the feasibility of actions through scene graphs. By generating scene graphs from images, it effectively identifies essential objects and their spatial relationships, which facilitates accurate plan verification and adjustments. The system utilizes these scene graphs to iteratively assess and modify action sequences generated by a task planner based on a large language model (LLM), guaranteeing that constraints are met and actions are viable. This method markedly enhances task completion rates in various manipulation tasks, achieving a 58% increase in language-based tasks, 56% in tangram puzzles, and 30% in image-based tasks compared to traditional approaches. VeriGraph addresses the shortcomings of existing vision-language models that frequently yield erroneous action sequences. Recent advancements in these models have created new opportunities for robot task planning, with scene graphs serving as a crucial intermediate representation for improved plan verification.

Key facts

  • VeriGraph integrates vision-language models for robotic planning with action feasibility verification
  • The framework uses scene graphs as an intermediate representation to capture objects and spatial relationships
  • Scene graphs are generated from input images to enable plan verification and refinement
  • The system iteratively checks and corrects action sequences from an LLM-based task planner
  • VeriGraph ensures constraints are respected and actions are executable
  • Task completion rates improve by 58% on language-based tasks compared to baselines
  • Performance gains include 56% improvement on tangram puzzle tasks
  • Image-based tasks show 30% improvement over baseline methods

Entities

Sources