ARTFEED — Contemporary Art Intelligence

FragBench: New Benchmark Exposes Cross-Session LLM Attacks

ai-technology · 2026-05-13

Researchers have unveiled FragBench, a benchmark aimed at identifying malicious prompts distributed across various LLM sessions. In contrast to current safety benchmarks that assess individual prompts or turns within a single conversation, FragBench detects attack signals that are dispersed across different sessions without any shared context. This benchmark is derived from 24 actual cyber-incident campaigns and encompasses the entire attack sequence: multi-fragment kill chains, safety-judge verdicts for each fragment, sandboxed execution traces, and corresponding benign cover sessions. FragBench divides this sequence into two components: FragBench Attack, an adversarial rewriter that fortifies fragments against a single-turn safety judge, and FragBench Defense, a graph-based user-level detector trained on the resulting interactions. The single-turn judge performs close to random chance on the released corpus, while four GNN variants and three classifiers are tested for defense. This research underscores a significant gap in LLM safety assessment and offers a novel tool for tackling cross-session threats.

Key facts

  • FragBench is a benchmark for cross-session LLM attacks.
  • It uses 24 real-world cyber-incident campaigns.
  • Attackers split malicious goals into sub-prompts across sessions.
  • Existing benchmarks evaluate single prompts or turns within one chat.
  • FragBench includes multi-fragment kill chains and safety-judge verdicts.
  • It has two tasks: FragBench Attack and FragBench Defense.
  • The single-turn judge is near chance on the corpus by construction.
  • Four GNN variants and three classifiers are evaluated for defense.

Entities

Institutions

  • arXiv

Sources