BRANE: LLM-Based Query Optimization for Retrieval Agents

ai-technology · 2026-05-27

A new paper on arXiv (2605.27361) introduces BRANE, a system that uses an LLM to convert natural-language queries into workload-specific characteristics, then trains lightweight per-configuration predictors to estimate pipeline correctness. At inference, BRANE selects the configuration maximizing predicted correctness penalized by cost, enabling tunable cost-quality tradeoffs without retraining. The approach addresses untapped per-query optimization in retrieval agents, which typically rely on hand-tuned pipelines. Evaluated on MuSiQue and BrowseC datasets, BRANE demonstrates improved accuracy-cost balance.

Key facts

arXiv paper 2605.27361 introduces BRANE
BRANE uses an LLM to convert queries into workload-specific characteristics
Trains lightweight per-configuration predictors for correctness estimation
Selects configuration maximizing predicted correctness penalized by cost
Enables tunable cost-quality tradeoff without retraining
Evaluated on MuSiQue and BrowseC datasets
Addresses per-query optimization gap in retrieval agents
Modern retrieval agents have many configuration choices (LLM, retriever, etc.)

BRANE: LLM-Based Query Optimization for Retrieval Agents

Key facts

Entities

Institutions

Sources