MCPShield: Graph-Based Attack Detection for LLM Tool-Call Traffic

ai-technology · 2026-05-13

MCPShield is an innovative framework designed to detect attacks on Model Context Protocol (MCP) tool-call traffic within LLM agents. It represents each agent session as a graph, where tool calls act as nodes and sequential/data-flow connections serve as edges, enhancing nodes with sentence-embedding characteristics derived from arguments and responses. The system categorizes sessions into benign or attacked. Assessments conducted on RAS-Eval, ATBench, and a combined-source variant involve comparisons among three GNN architectures (GAT, GCN, GraphSAGE), a non-graph MLP, and traditional models (XGBoost, random forest, logistic regression, linear SVM). GraphSAGE is maintained as the GNN benchmark for ATBench and the combined-source variant. Notably, content-level features prove crucial, while metadata-only strategies fall short.

Key facts

MCPShield is an attack detection framework for MCP tool-call traffic.
It encodes agent sessions as graphs with tool calls as nodes.
Nodes are enriched with sentence-embedding features from arguments and responses.
Three GNN architectures are evaluated: GAT, GCN, GraphSAGE.
Classical baselines include XGBoost, random forest, logistic regression, linear SVM.
GraphSAGE is the GNN baseline on ATBench and combined-source variant.
Content-level features are essential for detection.
Metadata-only approaches underperform.

MCPShield: Graph-Based Attack Detection for LLM Tool-Call Traffic

Key facts

Entities

Institutions

Sources