CutVerse Benchmark Tests AI Agents on Media Editing Tasks
Researchers have introduced CutVerse, a benchmark designed to evaluate autonomous GUI agents in professional media post-production environments. The benchmark includes expert demonstrations across seven applications such as Premiere Pro and Photoshop, covering 186 complex tasks grounded in authentic editing workflows. A lightweight parser transforms screen recordings and interaction logs into structured action trajectories. Evaluations show existing agents achieve only 36.0% task success, highlighting the gap in AI capabilities for creative workflows.
Key facts
- CutVerse is a benchmark for evaluating GUI agents in media post-production.
- It covers 7 professional applications including Premiere Pro and Photoshop.
- The benchmark includes 186 complex, long-horizon tasks.
- A parser converts screen recordings and interaction logs into structured trajectories.
- Existing agents achieve only 36.0% task success on these tasks.
- The research is published on arXiv with ID 2605.19484.
- The work underscores underexplored AI capabilities in creative workflows.
- The benchmark focuses on realistic media editing environments.
Entities
Institutions
- arXiv