Auto-ARGUE: LLM-Based Report Generation Evaluation Tool

other · 2026-04-30

A new tool named Auto-ARGUE has been developed by researchers, utilizing the ARGUE framework to assess report generation within retrieval-augmented generation (RAG) systems. This innovation fills a gap in the availability of open-source evaluation tools tailored for citation-supported report generation. Evaluations conducted on the TREC 2024 NeuCLIR report generation pilot task and two tasks from the TREC 2024 RAG track reveal strong correlations at the system level with human evaluations. Furthermore, the team has launched ARGUE-Viz, a web application designed for the visualization and detailed analysis of judgments and scores from Auto-ARGUE. This research has been submitted to arXiv in the field of computer science information retrieval.

Key facts

Auto-ARGUE is an LLM-based implementation of the ARGUE framework.
It evaluates report generation in RAG systems.
Open-source tools for report generation evaluation were lacking.
Analysis was performed on TREC 2024 NeuCLIR and RAG track tasks.
Results show good system-level correlations with human judgments.
ARGUE-Viz is a web app for visualization and analysis.
The submission is on arXiv under computer science information retrieval.
The tool focuses on citation-backed report generation.

Auto-ARGUE: LLM-Based Report Generation Evaluation Tool

Key facts

Entities

Institutions

Sources