AI Agent Learns to Sketch Objects Part by Part

ai-technology · 2026-04-27

A novel technique has been introduced by researchers for creating vector sketches incrementally, utilizing an agent based on a multi-modal language model. This method integrates supervised fine-tuning with an innovative reinforcement learning strategy that employs multi-turn process rewards. To support this, a new dataset named ControlSketch-Part was established, featuring detailed part-level annotations for sketches. This dataset was generated through an automated annotation system that divides vector sketches into semantic components and assigns paths to these components via a systematic multi-stage labeling approach. Findings indicate that using structured part-level information and offering visual feedback to the agent during the process leads to interpretable, controllable, and locally editable text-to-vector sketch creation.

Key facts

Method produces vector sketches one part at a time
Uses multi-modal language model-based agent
Novel multi-turn process-reward reinforcement learning
Supervised fine-tuning applied
New dataset: ControlSketch-Part
Automatic annotation pipeline segments sketches into semantic parts
Structured multi-stage labeling process assigns paths to parts
Enables interpretable, controllable, locally editable text-to-vector sketch generation

AI Agent Learns to Sketch Objects Part by Part

Key facts

Entities

Institutions

Sources