TCAP: Unsupervised Backdoor Detection for MLLM Fine-Tuning
A new unsupervised defense framework, Tri-Component Attention Profiling (TCAP), detects backdoor attacks in Multimodal Large Language Model (MLLM) fine-tuning. Fine-Tuning-as-a-Service (FTaaS) introduces risks from poisoned data. TCAP identifies a universal backdoor fingerprint: attention allocation divergence across system instructions, vision inputs, and user queries. It decomposes cross-modal attention maps, uses Gaussian Mixture Model (GMM) profiling to find trigger-responsive heads, and isolates poisoned samples via EM-based vote aggregation. The method generalizes across trigger types and modalities without supervised signals.
Key facts
- TCAP is an unsupervised defense for backdoor detection in MLLM fine-tuning.
- Fine-Tuning-as-a-Service (FTaaS) poses backdoor risks from poisoned data.
- Backdoor samples cause attention allocation divergence across three components.
- The three components are system instructions, vision inputs, and user textual queries.
- TCAP decomposes cross-modal attention maps into these three components.
- Gaussian Mixture Model (GMM) profiling identifies trigger-responsive attention heads.
- EM-based vote aggregation isolates poisoned samples.
- The method generalizes across diverse trigger types and modalities.
Entities
—