ToolPRM: Fine-Grained Inference Scaling for Structured Outputs in Function Calling

ai-technology · 2026-04-30

The newly introduced framework, ToolPRM, enhances inference scaling for structured outputs when calling functions in LLMs. It integrates a detailed beam search with a reward model that evaluates each decision made during the call, including the selection of function names and the filling of arguments. The researchers developed the inaugural fine-grained intra-call supervision dataset through techniques like function masking, rollout collection, and step-level annotations. ToolPRM surpasses both outcome and coarse-grained reward models in terms of predictive accuracy, consistently demonstrating improvements during testing across various function-calling benchmarks. Additionally, the findings indicate that structured generation exhibits a pattern of 'explore more but retain less,' as initial JSON errors cannot be corrected.

Key facts

ToolPRM is a process reward model for structured outputs in function calling.
It combines fine-grained beam search with intra-call decision scoring.
The first fine-grained intra-call supervision dataset was built via function masking, rollout collection, and step-level annotation.
ToolPRM outperforms outcome and coarse-grained reward models in predictive accuracy.
Consistent test-time gains are shown on multiple function-calling benchmarks.
Structured generation follows 'explore more but retain less' due to unrecoverable early JSON errors.
The framework focuses on inference scaling for structured outputs, not unstructured generation.
The study is published on arXiv under Computer Science > Artificial Intelligence.

ToolPRM: Fine-Grained Inference Scaling for Structured Outputs in Function Calling

Key facts

Entities

Institutions

Sources