MIST: Detecting Trojaned DNNs via Spectral Regression
A new Trojan detection method for deep neural networks (DNNs), named MIST, has been introduced by researchers. This approach focuses on monitoring changes in internal representations throughout the fine-tuning process. Rather than reconstructing trigger conditions, MIST evaluates benign model evolution by analyzing pre-activation spectra and identifies updates with spectral deviations that do not align with this baseline. It conceptualizes Trojan detection as a regression issue concerning model updates. Testing across four datasets and eight Trojan attacks demonstrates that spectral distances can effectively differentiate between Trojan-infected updates and clean fine-tuning. MIST surpasses existing detection accuracy after just one update, requiring no prior knowledge of the poisoned data or trigger, thus addressing security vulnerabilities in evolutionary fine-tuning processes.
Key facts
- MIST is a Trojan detection approach for DNNs
- Analyzes changes in internal representations during fine-tuning
- Uses pre-activation spectra to characterize benign model evolution
- Flags updates with spectral deviations inconsistent with reference
- Treats Trojan detection as a regression problem
- Evaluated on four datasets and eight Trojan attacks
- Outperforms state-of-the-art after a single update
- Requires no knowledge of poisoned data or trigger
Entities
—