MIST: Detecting Trojaned DNNs via Spectral Regression

ai-technology · 2026-05-22

A new Trojan detection method for deep neural networks (DNNs), named MIST, has been introduced by researchers. This approach focuses on monitoring changes in internal representations throughout the fine-tuning process. Rather than reconstructing trigger conditions, MIST evaluates benign model evolution by analyzing pre-activation spectra and identifies updates with spectral deviations that do not align with this baseline. It conceptualizes Trojan detection as a regression issue concerning model updates. Testing across four datasets and eight Trojan attacks demonstrates that spectral distances can effectively differentiate between Trojan-infected updates and clean fine-tuning. MIST surpasses existing detection accuracy after just one update, requiring no prior knowledge of the poisoned data or trigger, thus addressing security vulnerabilities in evolutionary fine-tuning processes.

Key facts

MIST is a Trojan detection approach for DNNs
Analyzes changes in internal representations during fine-tuning
Uses pre-activation spectra to characterize benign model evolution
Flags updates with spectral deviations inconsistent with reference
Treats Trojan detection as a regression problem
Evaluated on four datasets and eight Trojan attacks
Outperforms state-of-the-art after a single update
Requires no knowledge of poisoned data or trigger

Entities

—

Sources

arXiv cs.AI — 2026-05-21