OrcaRouter: Hybrid LLM Router Achieves 72.08 Arena Score
Researchers have introduced OrcaRouter, a new large language model router designed for production use. It features a LinUCB-based contextual bandit system that employs both lexical and sentence-embedding techniques. The model operates on a hybrid learning strategy, blending offline and online methods. Initially, during the offline stage, OrcaRouter evaluates each possible model against a specially curated set of routing prompts to collect detailed feedback, forming a reward matrix that aids in fitting a ridge regressor for each option. Once in action, it uses these initial parameters and adapts based on feedback, updating only the selected model’s arm. On May 20, 2026, OrcaRouter-Adaptive achieved second place in the RouterArena with a score of 72.08 and 75.54% accuracy.
Key facts
- OrcaRouter is a production-oriented LLM router
- It uses a LinUCB-based contextual bandit over lexical and sentence-embedding features
- Employs a hybrid offline-online learning protocol
- Offline training evaluates each candidate model on curated routing prompts
- Generates a reward matrix to fit one ridge regressor per arm
- At deployment, initializes from offline parameters and can continue learning
- OrcaRouter-Adaptive ranked second on RouterArena leaderboard as of May 20, 2026
- Achieved arena score of 72.08 and 75.54% accuracy
Entities
Institutions
- arXiv
- RouterArena