Cross-Session AI Agent Threats: New Benchmark and Detection Methods
A significant flaw has been discovered in the guardrails of AI agents: they lack memory, evaluating each message independently. This shortcoming enables attackers to execute a single assault over multiple sessions, avoiding detection since only the cumulative result contains the harmful payload. To combat this issue, researchers have developed CSTM-Bench, a dataset featuring 26 executable attack categories organized by kill-chain stage and cross-session actions (accumulate, compose, launder, inject_on_reader). Each attack is linked to one of seven identity anchors that define violations as policy predicates, along with corresponding benign confounders. Available on Hugging Face as intrinsec-ai/cstm-bench, the dataset includes two 54-scenario divisions: dilution (compositional) and cross_session (12 scenarios that remain undetected across sessions). This study redefines cross-session detection as an information-theoretic challenge and suggests detection algorithms, which is vital for enhancing the security of multi-session AI interactions.
Key facts
- AI-agent guardrails are memoryless, judging each message in isolation.
- Adversaries can spread attacks across sessions to evade session-bound detectors.
- CSTM-Bench contains 26 executable attack taxonomies.
- Attacks are classified by kill-chain stage and cross-session operation.
- Seven identity anchors define violation as a policy predicate.
- Dataset includes Benign-pristine and Benign-hard confounders.
- Released on Hugging Face as intrinsec-ai/cstm-bench.
- Two splits: dilution (compositional) and cross_session (12 scenarios).
Entities
Institutions
- Hugging Face