ARTFEED — Contemporary Art Intelligence

VL-DPO: Vision-Language-Guided Finetuning for Preference-Aligned Autonomous Driving

other · 2026-05-20

A new framework called VL-DPO uses vision-language models to align autonomous driving motion forecasting with human preferences. The approach generates preference pairs from a pretrained model's rollouts via a VLM zero-shot reasoner, then finetunes using Direct Preference Optimization (DPO). Models are trained on the Waymo Open End-to-End Driving Dataset (WOD-E2E) and evaluated against human preference annotations. The work addresses limitations of standard imitation learning in capturing nuanced driving preferences.

Key facts

  • VL-DPO is a vision-language-guided framework for aligning ego-vehicle motion forecasting models with human preferences.
  • It uses a VLM as a zero-shot reasoner to automatically generate preference pairs from a pretrained model's rollouts.
  • Finetuning is performed via Direct Preference Optimization (DPO).
  • Models are finetuned on the Waymo Open End-to-End Driving Dataset (WOD-E2E).
  • Performance is evaluated against held-out human preference annotations.
  • The approach aims to capture complex nuances of human driving preferences beyond standard imitation objectives.
  • The paper is published on arXiv with ID 2605.20082.
  • The work builds on recent advances in vision-language models (VLMs) for reasoning and commonsense understanding.

Entities

Institutions

  • arXiv
  • Waymo

Sources