PedestrianQA: New Benchmark Tests VLMs on Pedestrian Prediction

publication · 2026-05-26

Researchers introduced PedestrianQA, a large-scale video-based dataset that reformulates pedestrian intention and trajectory prediction as question-answering tasks with structured rationales. The dataset enables vision-language models to learn from visual dynamics, contextual cues, and traffic agent interactions while generating concise explanations. Evaluations were conducted across PIE, JAAD, TITAN, and IDD-PeD datasets.

Key facts

PedestrianQA is a large-scale video-based dataset for pedestrian intention and trajectory prediction.
It formulates prediction as question-answering tasks with structured rationales.
The dataset enables VLMs to learn from visual dynamics, contextual cues, and traffic agent interactions.
Evaluations were performed on PIE, JAAD, TITAN, and IDD-PeD datasets.
The work is published on arXiv with ID 2605.24562.
Pedestrian intention and trajectory prediction are critical for autonomous driving safety.
Recent advances in large vision-language models offer a new paradigm for these tasks.
PedestrianQA expresses annotated pedestrian sequences in natural language.

PedestrianQA: New Benchmark Tests VLMs on Pedestrian Prediction

Key facts

Entities

Institutions

Sources