ARTFEED — Contemporary Art Intelligence

TRACE: A Reference-Free Framework for Evaluating Tool-Augmented LLMs

other · 2026-05-16

TRACE, a novel framework, facilitates a multi-dimensional assessment of tool-augmented large language models without the need for ground-truth trajectories. It leverages an evidence bank that gathers insights from previous steps to evaluate reasoning paths concerning efficiency, hallucination, and adaptability. The researchers developed a meta-evaluation dataset featuring a variety of flawed trajectories, each assigned multi-faceted performance scores. Findings indicate that TRACE effectively assesses intricate trajectories, even when utilizing small open-source LLMs. This research addresses the shortcomings of existing benchmarks that depend solely on answer matching, overlooking essential trajectory elements. The paper can be found on arXiv with the identifier 2510.02837.

Key facts

  • TRACE is a reference-free framework for multi-dimensional evaluation of tool-augmented LLMs
  • It incorporates an evidence bank to accumulate knowledge from preceding steps
  • Evaluation covers efficiency, hallucination, and adaptivity of reasoning trajectories
  • A new meta-evaluation dataset with diverse flawed trajectories was developed
  • Each trajectory is labeled with multi-faceted performance scores
  • TRACE accurately evaluates complex trajectories even with small open-source LLMs
  • Current benchmarks are limited to answer matching and neglect trajectory aspects
  • Paper available on arXiv with identifier 2510.02837

Entities

Institutions

  • arXiv

Sources