ARTFEED — Contemporary Art Intelligence

MIST: Detecting Trojaned DNNs via Spectral Regression

ai-technology · 2026-05-22

A new Trojan detection method for deep neural networks (DNNs), named MIST, has been introduced by researchers. This approach focuses on monitoring changes in internal representations throughout the fine-tuning process. Rather than reconstructing trigger conditions, MIST evaluates benign model evolution by analyzing pre-activation spectra and identifies updates with spectral deviations that do not align with this baseline. It conceptualizes Trojan detection as a regression issue concerning model updates. Testing across four datasets and eight Trojan attacks demonstrates that spectral distances can effectively differentiate between Trojan-infected updates and clean fine-tuning. MIST surpasses existing detection accuracy after just one update, requiring no prior knowledge of the poisoned data or trigger, thus addressing security vulnerabilities in evolutionary fine-tuning processes.

Key facts

  • MIST is a Trojan detection approach for DNNs
  • Analyzes changes in internal representations during fine-tuning
  • Uses pre-activation spectra to characterize benign model evolution
  • Flags updates with spectral deviations inconsistent with reference
  • Treats Trojan detection as a regression problem
  • Evaluated on four datasets and eight Trojan attacks
  • Outperforms state-of-the-art after a single update
  • Requires no knowledge of poisoned data or trigger

Entities

Sources