ARTFEED — Contemporary Art Intelligence

ChinaTravel Benchmark Tests Language Agents on Open-Ended Travel Planning

publication · 2026-04-30

Researchers have introduced ChinaTravel, a benchmark for evaluating language agents on open-ended travel planning tasks. Unlike existing benchmarks that use slot-filling with predefined constraint menus, ChinaTravel captures the compositional, diverse, and often implicit nature of real user requirements. The benchmark features a practical sandbox aligned with multi-day, multi-POI travel planning, a domain-specific language (DSL) for scalable evaluation covering feasibility, constraint satisfaction, and preference comparison, and an open-ended dataset integrating diverse travel requirements and implicit intent from 1,154 human participants. The work is detailed in arXiv:2412.13682v5.

Key facts

  • ChinaTravel is a benchmark for language agents in travel planning.
  • It addresses the gap of open-ended natural language interaction.
  • Includes a practical sandbox for multi-day, multi-POI planning.
  • Uses a domain-specific language (DSL) for scalable evaluation.
  • Dataset integrates requirements from 1,154 human participants.
  • Focuses on compositional constraint validation.
  • Published on arXiv with ID 2412.13682v5.
  • Replaces slot-filling paradigm with open-ended queries.

Entities

Institutions

  • arXiv

Sources