RoboEval: A Scalable Benchmark for Robotic Manipulation Evaluation

other · 2026-05-07

RoboEval serves as a comprehensive evaluation framework and benchmark for robotic manipulation, moving past simple success indicators. It incorporates well-defined behavioral and outcome metrics to assess both execution quality and the nature of failures. The framework features eight bimanual tasks with specific variations, supported by more than three thousand expert demonstrations and a flexible simulation platform. Standardized metrics measure efficiency, coordination, safety/stability, and progress at various stages. Experiments conducted with cutting-edge visuomotor policies confirm the metrics' reliability, ability to differentiate performance, and their relationship with success rates.

Key facts

RoboEval augments binary success with behavioral and outcome metrics.
Includes eight bimanual tasks with systematically controlled variations.
Provides more than three thousand expert demonstrations.
Features a modular simulation platform for reproducible experimentation.
Metrics quantify efficiency, coordination, safety/stability, and stagewise progress.
Validated through experiments with state-of-the-art visuomotor policies.
Metrics show stability under variation and discriminative power across policies.
Framework localizes failure modes in robotic manipulation.

Entities

—

Sources

arXiv cs.AI — 2026-05-06