New Backdoor Attack Exploits LLM Compilation Optimizations
Researchers have identified that techniques for optimizing inference in large language models (LLMs), especially through compilation, can be manipulated to introduce covert backdoors. The suggested attack framework features two methods: one alters predictions for designated inputs exclusively during model compilation, while the other employs a universal trigger that stays inactive during uncompiled execution but seizes control of any input once compilation optimization is executed. Both methods evade conventional safety assessments conducted without compilation. Empirical data indicate an average attack success rate of 90% across four popular open-source LLMs. These findings reveal a significant vulnerability in the deployment processes of LLMs, where the numerical consequences of compilation can be exploited without altering the compiler or hardware.
Key facts
- Inference optimization is vital for deploying LLMs at scale.
- Compilation is the most widely adopted optimization technique for LLMs.
- Numerical side effects of compilation can be maliciously exploited to implant stealthy backdoors.
- The attack framework comprises two complementary strategies.
- One strategy flips predictions for specific inputs only when compiled.
- The other uses a universal trigger dormant under uncompiled execution.
- Both attacks bypass standard safety evaluations run without compilation.
- Attack success rates average 90% across four mainstream open-source LLMs.
Entities
—