PORTool: New Algorithm Improves Multi-Tool LLM Reasoning
Researchers have introduced PORTool, an importance-aware policy optimization algorithm designed to enhance multi-tool-integrated reasoning in large language models (LLMs). The algorithm addresses credit-assignment ambiguity in training tool-use agents from outcome-only rewards, which obscures which intermediate decisions lead to success or failure. PORTool generates a rewarded rollout tree where trajectories share prefixes before branching, enabling direct comparisons of alternative tool-use decisions within the same context. It estimates each step's importance using a correctness-dominant signal based on whether descendants of that step produce a correct final answer, plus an auxiliary term. The work is detailed in a paper on arXiv (2510.26020).
Key facts
- PORTool is an importance-aware policy optimization algorithm for multi-tool-integrated reasoning.
- It addresses credit-assignment ambiguity from outcome-only rewards.
- The algorithm generates a rewarded rollout tree with shared prefixes.
- It enables direct comparisons of alternative tool-use decisions.
- Importance is estimated via a correctness-dominant signal.
- The signal checks if descendants produce a correct final answer.
- An auxiliary term is also used in importance estimation.
- The paper is available on arXiv with ID 2510.26020.
Entities
Institutions
- arXiv