BRANE: LLM-Based Query Optimization for Retrieval Agents
A new paper on arXiv (2605.27361) introduces BRANE, a system that uses an LLM to convert natural-language queries into workload-specific characteristics, then trains lightweight per-configuration predictors to estimate pipeline correctness. At inference, BRANE selects the configuration maximizing predicted correctness penalized by cost, enabling tunable cost-quality tradeoffs without retraining. The approach addresses untapped per-query optimization in retrieval agents, which typically rely on hand-tuned pipelines. Evaluated on MuSiQue and BrowseC datasets, BRANE demonstrates improved accuracy-cost balance.
Key facts
- arXiv paper 2605.27361 introduces BRANE
- BRANE uses an LLM to convert queries into workload-specific characteristics
- Trains lightweight per-configuration predictors for correctness estimation
- Selects configuration maximizing predicted correctness penalized by cost
- Enables tunable cost-quality tradeoff without retraining
- Evaluated on MuSiQue and BrowseC datasets
- Addresses per-query optimization gap in retrieval agents
- Modern retrieval agents have many configuration choices (LLM, retriever, etc.)
Entities
Institutions
- arXiv