ARTFEED — Contemporary Art Intelligence

ExCyTIn-Bench: First Benchmark for LLM Agents in Cyber Threat Investigation

other · 2026-05-04

A team of researchers has unveiled ExCyTIn-Bench, the inaugural benchmark aimed at assessing LLM agents in the realm of cyber threat investigations. This benchmark utilizes security inquiries sourced from investigation graphs that are constructed within a controlled Azure tenant environment. It features a SQL setup comprising 57 log tables from Microsoft Sentinel and associated services, along with 7542 generated questions. These inquiries are formulated by employing expert-crafted detection logic to create threat investigation graphs, subsequently using LLMs to derive questions from paired nodes, where the starting node serves as context and the concluding node provides the answer. This initiative tackles the challenge of automating threat investigations, enabling analysts to navigate diverse security logs and trace multi-hop evidence chains. The benchmark seeks to enhance the development of LLM-driven agents for automated threat investigations.

Key facts

  • ExCyTIn-Bench is the first benchmark for evaluating LLM agents on cyber threat investigation.
  • The benchmark uses security questions derived from investigation graphs.
  • It is built from a controlled Azure tenant environment.
  • The SQL environment covers 57 log tables from Microsoft Sentinel and related services.
  • The benchmark includes 7542 generated questions.
  • Questions are generated using expert-crafted detection logic and paired nodes on investigation graphs.
  • The start node serves as background context and the end node as the answer.
  • The work aims to automate threat investigation using LLM agents.

Entities

Institutions

  • Microsoft Sentinel
  • Azure

Sources