MCPShield: Graph-Based Attack Detection for LLM Tool-Call Traffic
MCPShield is an innovative framework designed to detect attacks on Model Context Protocol (MCP) tool-call traffic within LLM agents. It represents each agent session as a graph, where tool calls act as nodes and sequential/data-flow connections serve as edges, enhancing nodes with sentence-embedding characteristics derived from arguments and responses. The system categorizes sessions into benign or attacked. Assessments conducted on RAS-Eval, ATBench, and a combined-source variant involve comparisons among three GNN architectures (GAT, GCN, GraphSAGE), a non-graph MLP, and traditional models (XGBoost, random forest, logistic regression, linear SVM). GraphSAGE is maintained as the GNN benchmark for ATBench and the combined-source variant. Notably, content-level features prove crucial, while metadata-only strategies fall short.
Key facts
- MCPShield is an attack detection framework for MCP tool-call traffic.
- It encodes agent sessions as graphs with tool calls as nodes.
- Nodes are enriched with sentence-embedding features from arguments and responses.
- Three GNN architectures are evaluated: GAT, GCN, GraphSAGE.
- Classical baselines include XGBoost, random forest, logistic regression, linear SVM.
- GraphSAGE is the GNN baseline on ATBench and combined-source variant.
- Content-level features are essential for detection.
- Metadata-only approaches underperform.
Entities
Institutions
- arXiv