ChinaTravel Benchmark Tests Language Agents on Open-Ended Travel Planning
Researchers have introduced ChinaTravel, a benchmark for evaluating language agents on open-ended travel planning tasks. Unlike existing benchmarks that use slot-filling with predefined constraint menus, ChinaTravel captures the compositional, diverse, and often implicit nature of real user requirements. The benchmark features a practical sandbox aligned with multi-day, multi-POI travel planning, a domain-specific language (DSL) for scalable evaluation covering feasibility, constraint satisfaction, and preference comparison, and an open-ended dataset integrating diverse travel requirements and implicit intent from 1,154 human participants. The work is detailed in arXiv:2412.13682v5.
Key facts
- ChinaTravel is a benchmark for language agents in travel planning.
- It addresses the gap of open-ended natural language interaction.
- Includes a practical sandbox for multi-day, multi-POI planning.
- Uses a domain-specific language (DSL) for scalable evaluation.
- Dataset integrates requirements from 1,154 human participants.
- Focuses on compositional constraint validation.
- Published on arXiv with ID 2412.13682v5.
- Replaces slot-filling paradigm with open-ended queries.
Entities
Institutions
- arXiv