Backdoor Attacks on KG-Enhanced LLMs via Soft Prompts

other · 2026-05-13

A recent study indicates that backdoor attacks aimed at text channels do not succeed against large language models (LLMs) enhanced with knowledge graphs (KGs) that utilize soft prompts. These models integrate retrieved subgraphs into continuous soft prompts through graph neural networks, forming a dual-channel structure. The researchers discovered a robustness gap: attacks targeting text channels are ineffective against systems based on soft prompts due to semantic anchoring, where soft prompts derived from graphs influence hidden states toward semantics consistent with the query, thereby mitigating harmful instructions. This paper, accessible on arXiv (2605.11996v1), emphasizes that the anchoring effect arises from the graph channel itself, indicating a need for novel attack strategies aimed at the graph-conditioned channel.

Key facts

arXiv paper 2605.11996v1 examines backdoor attacks on KG-enhanced LLMs with soft prompts.
KG-enhanced LLMs use graph neural networks to encode subgraphs into soft prompts.
Text-channel backdoor attacks are largely ineffective against soft-prompt-based systems.
Semantic anchoring by graph-derived soft prompts suppresses surface-level malicious instructions.
The robustness gap is due to the graph channel's conditioning effect.
Existing attacks are designed for the textual channel, not the dual-channel architecture.
The study calls for new attack methods targeting the graph-conditioned channel.

Backdoor Attacks on KG-Enhanced LLMs via Soft Prompts

Key facts

Entities

Institutions

Sources