PedestrianQA: New Benchmark Tests VLMs on Pedestrian Prediction
Researchers introduced PedestrianQA, a large-scale video-based dataset that reformulates pedestrian intention and trajectory prediction as question-answering tasks with structured rationales. The dataset enables vision-language models to learn from visual dynamics, contextual cues, and traffic agent interactions while generating concise explanations. Evaluations were conducted across PIE, JAAD, TITAN, and IDD-PeD datasets.
Key facts
- PedestrianQA is a large-scale video-based dataset for pedestrian intention and trajectory prediction.
- It formulates prediction as question-answering tasks with structured rationales.
- The dataset enables VLMs to learn from visual dynamics, contextual cues, and traffic agent interactions.
- Evaluations were performed on PIE, JAAD, TITAN, and IDD-PeD datasets.
- The work is published on arXiv with ID 2605.24562.
- Pedestrian intention and trajectory prediction are critical for autonomous driving safety.
- Recent advances in large vision-language models offer a new paradigm for these tasks.
- PedestrianQA expresses annotated pedestrian sequences in natural language.
Entities
Institutions
- arXiv