New Study Introduces ManimTrainer and ManimAgent for LLM-Based Animation Generation

ai-technology · 2026-04-22

A recent research paper presents two innovative methods, ManimTrainer and ManimAgent, aimed at enhancing the capability of Large Language Models to create programmatic animations via the Manim library. This study tackles specific hurdles that LLMs encounter, such as spatial reasoning, temporal sequencing, and the scarcity of domain-specific APIs in standard training datasets. ManimTrainer merges Supervised Fine-tuning with Reinforcement Learning through Group Relative Policy Optimisation, employing a comprehensive reward signal that integrates both code and visual evaluations. For its inference process, ManimAgent utilizes Renderer-in-the-loop and API documentation-enhanced RITL techniques. This research marks the first comprehensive study on training and inference for text-to-code-to-video conversion with Manim, assessing 17 open-source models. The findings were published on arXiv under identifier 2604.18364v1.

Key facts

The study introduces ManimTrainer, a training pipeline combining SFT and RL-based GRPO
ManimAgent is an inference pipeline featuring RITL and RITL-DOC strategies
Research addresses LLM challenges with spatial reasoning and temporal sequencing in animation
Domain-specific APIs for Manim are underrepresented in general pre-training data
Unified reward signal fuses code and visual assessment signals
First unified training and inference study for text-to-code-to-video transformation with Manim
Evaluates 17 open-source models
Published on arXiv with identifier 2604.18364v1

New Study Introduces ManimTrainer and ManimAgent for LLM-Based Animation Generation

Key facts

Entities

Institutions

Sources