ARTFEED — Contemporary Art Intelligence

TransitLM: Large-Scale Dataset for Map-Free Transit Route Generation

ai-technology · 2026-05-23

TransitLM, a newly introduced dataset, comprises more than 13 million records related to transit route planning from four cities in China, encompassing 120,845 stations and 13,666 lines. This dataset functions as both a continuous pre-training corpus and a benchmark for three assessment tasks. Findings indicate that a large language model (LLM) trained on TransitLM produces routes that are structurally sound and highly accurate, effectively associating GPS coordinates with stations without needing explicit mapping. This illustrates that transit route planning can be comprehensively learned from data alone. The dataset and its benchmark can be accessed on Hugging Face.

Key facts

  • Dataset includes over 13 million transit route planning records.
  • Covers four Chinese cities.
  • Includes 120,845 stations and 13,666 lines.
  • Used as a continual pre-training corpus and benchmark.
  • Three evaluation tasks with complementary metrics.
  • LLM trained on TransitLM produces structurally valid routes at high accuracy.
  • Implicitly grounds arbitrary GPS coordinates to appropriate stations without explicit mapping.
  • Dataset available at https://huggingface.co/datasets/GD-ML/TransitLM

Entities

Institutions

  • arXiv
  • Hugging Face

Locations

  • China

Sources