ARTFEED — Contemporary Art Intelligence

New AI System Diagnoses LLM Agent Failures at Scale

ai-technology · 2026-05-22

A new multi-agent system called the Insights Generator (IG) has been developed by researchers to identify systematic behavioral trends in large language model (LLM) agents by examining comprehensive execution trace corpora. The findings, detailed in the arXiv preprint 2605.21347, tackle the challenges of corpus-level trace diagnostics, overcoming the inadequacies of manual reviews that overlook broader population trends and are not suitable for production settings where traces can extend to tens of thousands of tokens. IG formulates and tests hypotheses across trace groups to answer diagnostic inquiries, generating evidence-based insights reports. The evaluation of the system included qualitative and objective metrics, such as rubric-based assessments and performance enhancements resulting from IG's suggestions, underscoring a move towards automated, scalable debugging for LLM agents in complex applications.

Key facts

  • arXiv preprint 2605.21347 introduces the Insights Generator (IG)
  • IG is a multi-agent system for corpus-level trace diagnostics
  • It analyzes entire corpora of execution traces to identify systematic behavioral patterns
  • Manual inspection of traces is limited to small subsets and ad-hoc hypotheses
  • Individual traces in production can span tens of thousands of tokens
  • IG answers diagnostic questions by proposing and testing hypotheses
  • Evaluation includes rubric-based report assessment and downstream performance improvements
  • The system produces evidence-backed insights reports

Entities

Institutions

  • arXiv

Sources