TRACE: A Reference-Free Framework for Evaluating Tool-Augmented LLMs

other · 2026-05-16

TRACE, a novel framework, facilitates a multi-dimensional assessment of tool-augmented large language models without the need for ground-truth trajectories. It leverages an evidence bank that gathers insights from previous steps to evaluate reasoning paths concerning efficiency, hallucination, and adaptability. The researchers developed a meta-evaluation dataset featuring a variety of flawed trajectories, each assigned multi-faceted performance scores. Findings indicate that TRACE effectively assesses intricate trajectories, even when utilizing small open-source LLMs. This research addresses the shortcomings of existing benchmarks that depend solely on answer matching, overlooking essential trajectory elements. The paper can be found on arXiv with the identifier 2510.02837.

Key facts

TRACE is a reference-free framework for multi-dimensional evaluation of tool-augmented LLMs
It incorporates an evidence bank to accumulate knowledge from preceding steps
Evaluation covers efficiency, hallucination, and adaptivity of reasoning trajectories
A new meta-evaluation dataset with diverse flawed trajectories was developed
Each trajectory is labeled with multi-faceted performance scores
TRACE accurately evaluates complex trajectories even with small open-source LLMs
Current benchmarks are limited to answer matching and neglect trajectory aspects
Paper available on arXiv with identifier 2510.02837

TRACE: A Reference-Free Framework for Evaluating Tool-Augmented LLMs

Key facts

Entities

Institutions

Sources