ToolPRM: Fine-Grained Inference Scaling for Structured Outputs in Function Calling
The newly introduced framework, ToolPRM, enhances inference scaling for structured outputs when calling functions in LLMs. It integrates a detailed beam search with a reward model that evaluates each decision made during the call, including the selection of function names and the filling of arguments. The researchers developed the inaugural fine-grained intra-call supervision dataset through techniques like function masking, rollout collection, and step-level annotations. ToolPRM surpasses both outcome and coarse-grained reward models in terms of predictive accuracy, consistently demonstrating improvements during testing across various function-calling benchmarks. Additionally, the findings indicate that structured generation exhibits a pattern of 'explore more but retain less,' as initial JSON errors cannot be corrected.
Key facts
- ToolPRM is a process reward model for structured outputs in function calling.
- It combines fine-grained beam search with intra-call decision scoring.
- The first fine-grained intra-call supervision dataset was built via function masking, rollout collection, and step-level annotation.
- ToolPRM outperforms outcome and coarse-grained reward models in predictive accuracy.
- Consistent test-time gains are shown on multiple function-calling benchmarks.
- Structured generation follows 'explore more but retain less' due to unrecoverable early JSON errors.
- The framework focuses on inference scaling for structured outputs, not unstructured generation.
- The study is published on arXiv under Computer Science > Artificial Intelligence.
Entities
Institutions
- arXiv