VLA Driving Models Show 42.5% Reasoning Fidelity, 94 Missed Pedestrians

ai-technology · 2026-05-20

A comprehensive investigation into the reliability of Vision-Language-Action (VLA) driving models has uncovered notable deficiencies in their reasoning capabilities. Researchers evaluated 300 Alpamayo-R1-10B inferences across 100 PhysicalAI-AV scenarios, revealing an overall reasoning fidelity of just 42.5%. The Chain-of-Causation was found to align with real-world scenes less than 50% of the time. The study identified 94 instances of missed pedestrians in one-third of relevant scenarios, with 97.7% trajectory instability under minor visual disturbances and a mere 48.3% average consistency between reasoning and action. Consistency was particularly low in 53.3% of inferences, including 37.9% of cases where the model incorrectly continued instead of stopping. This paper is the inaugural systematic analysis of faithfulness in VLA driving models, establishing information-theoretic definitions for fidelity and proposing a four-component safety framework.

Key facts

First systematic study of faithfulness in VLA driving models
Analyzed 300 Alpamayo-R1-10B inferences across 100 PhysicalAI-AV scenarios
Overall reasoning fidelity is 42.5%
94 missed pedestrians in one-third of pedestrian-relevant scenes
97.7% trajectory fragility under mild visual perturbations
48.3% mean reasoning-action consistency
53.3% of inferences exhibit low consistency
37.9% of stop-claimed cases where model continues instead

VLA Driving Models Show 42.5% Reasoning Fidelity, 94 Missed Pedestrians

Key facts

Entities

Institutions

Sources