Bits-over-Random Metric Optimizes LLM Tool Selection
A new metric, Bits-over-Random (BoR), evaluates the optimal number of tools shown to an LLM agent during retrieval. Fixed shortlist sizes often fail: too many tools confuse the model, too few omit the correct one. BoR measures whether success at a given depth exceeds random chance. Tested across three benchmarks with registries of 20 to 3,251 tools, BoR also serves as a reinforcement learning reward for per-query depth selection. The RL agent remains deliberately simple to probe the metric's effectiveness.
Key facts
- BoR is a chance-corrected metric for tool shortlist depth.
- Fixed shortlist sizes are suboptimal for LLM tool retrieval.
- BoR compares success at a given depth to random selection.
- Evaluated on three tool-selection benchmarks.
- Tool registries range from 20 to 3,251 tools.
- BoR is used as a reinforcement learning reward.
- The RL agent is deliberately simple.
- The approach treats tool count as the object of evaluation.
Entities
Institutions
- arXiv