Drive-P2D: New Benchmark Tests VLMs in Autonomous Driving

ai-technology · 2026-05-27

Drive-P2D is an innovative benchmark designed to assess vision-language models (VLMs) in the context of autonomous driving, detailed in a paper on arXiv (2601.14702). It features a total of 6,650 questions categorized into Object, Scene, and Decision levels. This benchmark employs a distinct protocol for reasoning and answering: while final answers are evaluated objectively, the reasoning process is scrutinized to pinpoint error types within the perception-to-decision framework. It tests leading VLMs across various scenarios, including high-risk situations, thus defining the limits of perception-to-decision capabilities. This research tackles the shortcomings of current benchmarks that evaluate perception and decision-making in isolation, limit failure analysis to choice-only formats, or introduce bias through long-form outputs scored by LLMs.

Key facts

Drive-P2D is a progressive perception-to-decision benchmark for VLMs in autonomous driving.
It contains 6,650 questions across Object, Scene, and Decision levels.
The benchmark uses a separated reasoning-and-answer protocol.
Final answers are scored objectively; reasoning is analyzed for error modes.
It evaluates mainstream VLMs in all and high-risk scenarios.
It characterizes the perception-to-decision capability boundary.
The paper is available on arXiv with ID 2601.14702.
It addresses limitations of existing benchmarks in autonomous driving.

Drive-P2D: New Benchmark Tests VLMs in Autonomous Driving

Key facts

Entities

Institutions

Sources