Vision-Language Models Automate Crash Diagram Generation for Transportation Safety

ai-technology · 2026-04-20

A study published on arXiv (2604.15332v1) demonstrates how Vision-Language Models can automate the creation of crash diagrams from police reports, addressing the time-consuming and variable nature of manual preparation. Focusing on multilane roundabouts as a complex test case, researchers developed a structured prompt framework to guide models through interpretation, extraction, and visual synthesis. Three models—GPT-4o, Gemini-1.5-Flash, and Janus-4o—were evaluated on 79 crash reports using a 10-metric system assessing semantic accuracy, spatial fidelity, and visual clarity. GPT-4o achieved the highest average performance score of 6.29 out of 10, followed by Gemini-1.5-Flash at 5.28 and Janus-4o at 3.64. The analysis highlighted GPT-4o's superior spatial reasoning capabilities and strong alignment between extracted data and visual representations. This research explores the application of AI technology to enhance transportation safety analysis by streamlining diagram generation processes.

Key facts

Study published on arXiv with identifier 2604.15332v1
Focuses on automating crash diagram generation from police reports
Uses multilane roundabouts as challenging test case
Developed three-part structured prompt framework for model reasoning
Created 10-metric evaluation system for diagram quality assessment
Tested three models: GPT-4o, Gemini-1.5-Flash, and Janus-4o
Evaluated on 79 crash reports
GPT-4o achieved highest average performance score of 6.29/10

Vision-Language Models Automate Crash Diagram Generation for Transportation Safety

Key facts

Entities

Institutions

Sources