MTRouter: Cost-Aware Multi-Turn LLM Routing with History-Model Joint Embeddings
A new system called MTRouter addresses the high inference costs of multi-turn, long-horizon tasks in large language models (LLMs). It selects which model to invoke at each turn from a pool, given a fixed cost budget, by encoding interaction history and candidate models into joint embeddings and learning an outcome estimator from logged trajectories. On ScienceWorld, MTRouter surpasses GPT-5 while reducing total cost by 58.7%. On Humanity's Last Exam (HLE), it achieves competitive accuracy with a 43.4% cost reduction relative to GPT-5. These gains extend to held-out tasks. The research is published on arXiv (2604.23530).
Key facts
- MTRouter is a cost-aware multi-turn LLM routing system.
- It selects which model to invoke at each turn from a model pool.
- It uses joint history-model embeddings and an outcome estimator.
- On ScienceWorld, MTRouter surpasses GPT-5 with 58.7% cost reduction.
- On HLE, it achieves competitive accuracy with 43.4% cost reduction.
- Gains carry over to held-out tasks.
- The paper is on arXiv with ID 2604.23530.
- Multi-turn, long-horizon tasks are increasingly common for LLMs.
Entities
Institutions
- arXiv