ARTFEED — Contemporary Art Intelligence

PedestrianQA: New Benchmark Tests VLMs on Pedestrian Prediction

publication · 2026-05-26

Researchers introduced PedestrianQA, a large-scale video-based dataset that reformulates pedestrian intention and trajectory prediction as question-answering tasks with structured rationales. The dataset enables vision-language models to learn from visual dynamics, contextual cues, and traffic agent interactions while generating concise explanations. Evaluations were conducted across PIE, JAAD, TITAN, and IDD-PeD datasets.

Key facts

  • PedestrianQA is a large-scale video-based dataset for pedestrian intention and trajectory prediction.
  • It formulates prediction as question-answering tasks with structured rationales.
  • The dataset enables VLMs to learn from visual dynamics, contextual cues, and traffic agent interactions.
  • Evaluations were performed on PIE, JAAD, TITAN, and IDD-PeD datasets.
  • The work is published on arXiv with ID 2605.24562.
  • Pedestrian intention and trajectory prediction are critical for autonomous driving safety.
  • Recent advances in large vision-language models offer a new paradigm for these tasks.
  • PedestrianQA expresses annotated pedestrian sequences in natural language.

Entities

Institutions

  • arXiv

Sources