ARTFEED — Contemporary Art Intelligence

Vision-Language Models Automate Crash Diagram Generation for Transportation Safety

ai-technology · 2026-04-20

A study published on arXiv (2604.15332v1) demonstrates how Vision-Language Models can automate the creation of crash diagrams from police reports, addressing the time-consuming and variable nature of manual preparation. Focusing on multilane roundabouts as a complex test case, researchers developed a structured prompt framework to guide models through interpretation, extraction, and visual synthesis. Three models—GPT-4o, Gemini-1.5-Flash, and Janus-4o—were evaluated on 79 crash reports using a 10-metric system assessing semantic accuracy, spatial fidelity, and visual clarity. GPT-4o achieved the highest average performance score of 6.29 out of 10, followed by Gemini-1.5-Flash at 5.28 and Janus-4o at 3.64. The analysis highlighted GPT-4o's superior spatial reasoning capabilities and strong alignment between extracted data and visual representations. This research explores the application of AI technology to enhance transportation safety analysis by streamlining diagram generation processes.

Key facts

  • Study published on arXiv with identifier 2604.15332v1
  • Focuses on automating crash diagram generation from police reports
  • Uses multilane roundabouts as challenging test case
  • Developed three-part structured prompt framework for model reasoning
  • Created 10-metric evaluation system for diagram quality assessment
  • Tested three models: GPT-4o, Gemini-1.5-Flash, and Janus-4o
  • Evaluated on 79 crash reports
  • GPT-4o achieved highest average performance score of 6.29/10

Entities

Institutions

  • arXiv

Sources