ARTFEED — Contemporary Art Intelligence

TCAP: Unsupervised Backdoor Detection for MLLM Fine-Tuning

ai-technology · 2026-05-25

A new unsupervised defense framework, Tri-Component Attention Profiling (TCAP), detects backdoor attacks in Multimodal Large Language Model (MLLM) fine-tuning. Fine-Tuning-as-a-Service (FTaaS) introduces risks from poisoned data. TCAP identifies a universal backdoor fingerprint: attention allocation divergence across system instructions, vision inputs, and user queries. It decomposes cross-modal attention maps, uses Gaussian Mixture Model (GMM) profiling to find trigger-responsive heads, and isolates poisoned samples via EM-based vote aggregation. The method generalizes across trigger types and modalities without supervised signals.

Key facts

  • TCAP is an unsupervised defense for backdoor detection in MLLM fine-tuning.
  • Fine-Tuning-as-a-Service (FTaaS) poses backdoor risks from poisoned data.
  • Backdoor samples cause attention allocation divergence across three components.
  • The three components are system instructions, vision inputs, and user textual queries.
  • TCAP decomposes cross-modal attention maps into these three components.
  • Gaussian Mixture Model (GMM) profiling identifies trigger-responsive attention heads.
  • EM-based vote aggregation isolates poisoned samples.
  • The method generalizes across diverse trigger types and modalities.

Entities

Sources