ARTFEED — Contemporary Art Intelligence

ToolPRM: Fine-Grained Inference Scaling for Structured Outputs in Function Calling

ai-technology · 2026-04-30

The newly introduced framework, ToolPRM, enhances inference scaling for structured outputs when calling functions in LLMs. It integrates a detailed beam search with a reward model that evaluates each decision made during the call, including the selection of function names and the filling of arguments. The researchers developed the inaugural fine-grained intra-call supervision dataset through techniques like function masking, rollout collection, and step-level annotations. ToolPRM surpasses both outcome and coarse-grained reward models in terms of predictive accuracy, consistently demonstrating improvements during testing across various function-calling benchmarks. Additionally, the findings indicate that structured generation exhibits a pattern of 'explore more but retain less,' as initial JSON errors cannot be corrected.

Key facts

  • ToolPRM is a process reward model for structured outputs in function calling.
  • It combines fine-grained beam search with intra-call decision scoring.
  • The first fine-grained intra-call supervision dataset was built via function masking, rollout collection, and step-level annotation.
  • ToolPRM outperforms outcome and coarse-grained reward models in predictive accuracy.
  • Consistent test-time gains are shown on multiple function-calling benchmarks.
  • Structured generation follows 'explore more but retain less' due to unrecoverable early JSON errors.
  • The framework focuses on inference scaling for structured outputs, not unstructured generation.
  • The study is published on arXiv under Computer Science > Artificial Intelligence.

Entities

Institutions

  • arXiv

Sources