AI Agent Learns to Sketch Objects Part by Part
A novel technique has been introduced by researchers for creating vector sketches incrementally, utilizing an agent based on a multi-modal language model. This method integrates supervised fine-tuning with an innovative reinforcement learning strategy that employs multi-turn process rewards. To support this, a new dataset named ControlSketch-Part was established, featuring detailed part-level annotations for sketches. This dataset was generated through an automated annotation system that divides vector sketches into semantic components and assigns paths to these components via a systematic multi-stage labeling approach. Findings indicate that using structured part-level information and offering visual feedback to the agent during the process leads to interpretable, controllable, and locally editable text-to-vector sketch creation.
Key facts
- Method produces vector sketches one part at a time
- Uses multi-modal language model-based agent
- Novel multi-turn process-reward reinforcement learning
- Supervised fine-tuning applied
- New dataset: ControlSketch-Part
- Automatic annotation pipeline segments sketches into semantic parts
- Structured multi-stage labeling process assigns paths to parts
- Enables interpretable, controllable, locally editable text-to-vector sketch generation
Entities
Institutions
- arXiv