Emotion-Style Dynamic Backdoor Attack on LLMs

ai-technology · 2026-05-13

A new backdoor attack method, Paraesthesia, targets large language models (LLMs) by using emotion as a dynamic trigger rather than static tokens. The attack exploits the observation that emotion can be decoupled from semantics in LLM representation space, forming distinct clusters. By mixing emotionally triggered samples into clean fine-tuning data, the model becomes vulnerable to producing harmful outputs when emotional cues are present. This approach enhances stealthiness and resilience against detection compared to traditional token-level attacks.

Key facts

Backdoor vulnerabilities exist in LLM fine-tuning.
Most prior attacks use token-level triggers.
Static triggers are easy to detect and weaken with clean fine-tuning.
Emotion functions as an overall stylistic factor through tone.
Emotion can be decoupled from semantics in LLM representation space.
Paraesthesia uses emotion as a dynamic backdoor trigger.
The attack mixes emotional trigger samples with clean data.
The method is proposed in a paper on arXiv (2605.11612).

Emotion-Style Dynamic Backdoor Attack on LLMs

Key facts

Entities

Institutions

Sources