ARTFEED — Contemporary Art Intelligence

VLA Driving Models Show 42.5% Reasoning Fidelity, 94 Missed Pedestrians

ai-technology · 2026-05-20

A comprehensive investigation into the reliability of Vision-Language-Action (VLA) driving models has uncovered notable deficiencies in their reasoning capabilities. Researchers evaluated 300 Alpamayo-R1-10B inferences across 100 PhysicalAI-AV scenarios, revealing an overall reasoning fidelity of just 42.5%. The Chain-of-Causation was found to align with real-world scenes less than 50% of the time. The study identified 94 instances of missed pedestrians in one-third of relevant scenarios, with 97.7% trajectory instability under minor visual disturbances and a mere 48.3% average consistency between reasoning and action. Consistency was particularly low in 53.3% of inferences, including 37.9% of cases where the model incorrectly continued instead of stopping. This paper is the inaugural systematic analysis of faithfulness in VLA driving models, establishing information-theoretic definitions for fidelity and proposing a four-component safety framework.

Key facts

  • First systematic study of faithfulness in VLA driving models
  • Analyzed 300 Alpamayo-R1-10B inferences across 100 PhysicalAI-AV scenarios
  • Overall reasoning fidelity is 42.5%
  • 94 missed pedestrians in one-third of pedestrian-relevant scenes
  • 97.7% trajectory fragility under mild visual perturbations
  • 48.3% mean reasoning-action consistency
  • 53.3% of inferences exhibit low consistency
  • 37.9% of stop-claimed cases where model continues instead

Entities

Institutions

  • PhysicalAI
  • arXiv

Sources