Sparse Backdoor: Provably Undetectable Supply-Chain Attack on Image Classifiers

ai-technology · 2026-05-07

A novel supply-chain attack named Sparse Backdoor has been developed by researchers, capable of embedding an undetectable backdoor in pre-trained image classifiers, such as convolutional networks and Vision Transformers. This method introduces a structured sparse perturbation directed randomly into a limited number of columns within each fully connected layer, sending a trigger signal to a target class selected by an adversary. An independent isotropic Gaussian dither masks the perturbation, creating a clean reference distribution based on the pre-trained weights, which allows for formal undetectability. The authors demonstrate that differentiating the backdoor-infected model from this reference is at least as challenging as detecting Sparse PCA, a task that is computationally difficult. This attack underscores new security vulnerabilities in deep learning, raising concerns about AI safety and reliability.

Key facts

Sparse Backdoor is a supply-chain attack on pre-trained image classifiers.
It targets convolutional networks and Vision Transformers.
The attack injects a structured sparse perturbation into fully connected layers.
A Gaussian dither masks the perturbation and enables formal undetectability.
The dithered reference is functionally equivalent to the original classifier under a margin condition.
Detecting the backdoor is at least as hard as Sparse PCA detection.
The attack propagates a trigger signal to an adversary-chosen target class.
The paper is published on arXiv with ID 2605.04209.

Sparse Backdoor: Provably Undetectable Supply-Chain Attack on Image Classifiers

Key facts

Entities

Institutions

Sources