Mini-BEHAVIOR-Gran Benchmark Reveals U-Shaped Performance Curve in AI Instruction Granularity

ai-technology · 2026-04-22

A team of researchers has rolled out a new standard called Mini-BEHAVIOR-Gran to more effectively explore how the detail of instructions affects language-driven embodied AI agents. This new approach enhances the Mini-BEHAVIOR framework by creating different types of instructions for each task, ranging from general goals to detailed, step-by-step guides. They assessed four aspects of instruction granularity: token count, action-verb count, entity count, and planning-width. Interestingly, they found a consistent link between planning-width and how well the agents performed. By structuring training and evaluation around this width, they observed a U-shaped trend in performance linked to instruction granularity, with the best results at both fine and coarse levels. This study is available in arXiv preprint 2604.17019v1, addressing a significant gap in how embodied AI is evaluated, as most existing benchmarks use just one static instruction per task.

Key facts

Mini-BEHAVIOR-Gran is a new benchmark for studying instruction granularity in embodied AI
The benchmark extends Mini-BEHAVIOR with multiple instruction variants per task
Instruction variants range from high-level goal descriptions to step-by-step guidance
Four metrics were compared for cross-task granularity quantification
Planning-width correlated most consistently with agent performance
A U-shaped relationship exists between instruction granularity and performance
Performance peaks at both fine-grained and coarse-grained instruction levels
Existing benchmarks typically use single static instructions per task

Entities

—

Sources

arXiv cs.AI — 2026-04-21