ChinaTravel Benchmark Tests Language Agents on Open-Ended Travel Planning

publication · 2026-04-30

Researchers have introduced ChinaTravel, a benchmark for evaluating language agents on open-ended travel planning tasks. Unlike existing benchmarks that use slot-filling with predefined constraint menus, ChinaTravel captures the compositional, diverse, and often implicit nature of real user requirements. The benchmark features a practical sandbox aligned with multi-day, multi-POI travel planning, a domain-specific language (DSL) for scalable evaluation covering feasibility, constraint satisfaction, and preference comparison, and an open-ended dataset integrating diverse travel requirements and implicit intent from 1,154 human participants. The work is detailed in arXiv:2412.13682v5.

Key facts

ChinaTravel is a benchmark for language agents in travel planning.
It addresses the gap of open-ended natural language interaction.
Includes a practical sandbox for multi-day, multi-POI planning.
Uses a domain-specific language (DSL) for scalable evaluation.
Dataset integrates requirements from 1,154 human participants.
Focuses on compositional constraint validation.
Published on arXiv with ID 2412.13682v5.
Replaces slot-filling paradigm with open-ended queries.

ChinaTravel Benchmark Tests Language Agents on Open-Ended Travel Planning

Key facts

Entities

Institutions

Sources