InfoTree: A Submodular Framework for Tool-Use Agentic RL Under Fixed Budget
A recent study available on arXiv introduces Rollout Informativeness under a Fixed Budget (RIFB), aimed at enhancing reinforcement learning for agents utilizing tools. The researchers observed that samplers operating independently, without acknowledging budget constraints, tended to exhibit a higher than zero collapse rate with difficult prompts. By reinterpreting the selection of intermediate states as a monotone submodular maximization challenge, they provided a method that achieved an approximation guarantee of 1 - 1/e through a greedy strategy. Additionally, the InfoTree framework integrates Uncertainty-aware Upper Confidence Bound (UUCB) terms with an Adaptive Budget Allocator (ABA) to improve prompt optimization under specific budgets.
Key facts
- Paper published on arXiv with ID 2605.05262
- Formalizes Rollout Informativeness under a Fixed Budget (RIFB)
- Proves budget-agnostic independent samplers collapse for hard prompts
- Recasts state selection as monotone submodular maximization
- Greedy selector achieves 1 - 1/e approximation guarantee
- UUCB terms derived as closed-form marginal gains
- InfoTree framework includes UUCB, ABA, and Speculative Expansion
- Token-level entropy bonus is an analytic consequence
Entities
Institutions
- arXiv