TIER: New Reward Framework for Multi-Step Tool Use in LLMs

ai-technology · 2026-05-20

A new framework for reinforcement learning in large language models (LLMs), named TIER (Trajectory-Invariant Execution Rewards), has been introduced by researchers to tackle scalability challenges in multi-step tool composition. Unlike traditional outcome-based rewards that offer limited feedback or trajectory-supervised rewards reliant on annotated references, TIER generates supervision from function schemas and actual runtime execution. The reward system is broken down into several components: format validity, adherence to schemas, successful execution, and correctness of answers, providing comprehensive and interpretable feedback at the sequence level through detailed verification of each tool-use step. This approach allows for recognition of any valid execution path, accommodating various solution methods and adapting to changing tool interfaces. The study is available on arXiv under ID 2605.16790.

Key facts

TIER stands for Trajectory-Invariant Execution Rewards
Framework is for multi-step tool composition in LLMs
Derives supervision from function schemas and runtime execution
Reward decomposes into format validity, schema adherence, execution success, and answer correctness
Provides dense, interpretable sequence-level feedback
Supports multiple solution strategies
Adapts to evolving tool interfaces
Published on arXiv with ID 2605.16790

TIER: New Reward Framework for Multi-Step Tool Use in LLMs

Key facts

Entities

Institutions

Sources