Cross-Session AI Agent Threats: New Benchmark and Detection Methods

ai-technology · 2026-04-25

A significant flaw has been discovered in the guardrails of AI agents: they lack memory, evaluating each message independently. This shortcoming enables attackers to execute a single assault over multiple sessions, avoiding detection since only the cumulative result contains the harmful payload. To combat this issue, researchers have developed CSTM-Bench, a dataset featuring 26 executable attack categories organized by kill-chain stage and cross-session actions (accumulate, compose, launder, inject_on_reader). Each attack is linked to one of seven identity anchors that define violations as policy predicates, along with corresponding benign confounders. Available on Hugging Face as intrinsec-ai/cstm-bench, the dataset includes two 54-scenario divisions: dilution (compositional) and cross_session (12 scenarios that remain undetected across sessions). This study redefines cross-session detection as an information-theoretic challenge and suggests detection algorithms, which is vital for enhancing the security of multi-session AI interactions.

Key facts

AI-agent guardrails are memoryless, judging each message in isolation.
Adversaries can spread attacks across sessions to evade session-bound detectors.
CSTM-Bench contains 26 executable attack taxonomies.
Attacks are classified by kill-chain stage and cross-session operation.
Seven identity anchors define violation as a policy predicate.
Dataset includes Benign-pristine and Benign-hard confounders.
Released on Hugging Face as intrinsec-ai/cstm-bench.
Two splits: dilution (compositional) and cross_session (12 scenarios).

Cross-Session AI Agent Threats: New Benchmark and Detection Methods

Key facts

Entities

Institutions

Sources