TCAP: Unsupervised Backdoor Detection for MLLM Fine-Tuning

ai-technology · 2026-05-25

A new unsupervised defense framework, Tri-Component Attention Profiling (TCAP), detects backdoor attacks in Multimodal Large Language Model (MLLM) fine-tuning. Fine-Tuning-as-a-Service (FTaaS) introduces risks from poisoned data. TCAP identifies a universal backdoor fingerprint: attention allocation divergence across system instructions, vision inputs, and user queries. It decomposes cross-modal attention maps, uses Gaussian Mixture Model (GMM) profiling to find trigger-responsive heads, and isolates poisoned samples via EM-based vote aggregation. The method generalizes across trigger types and modalities without supervised signals.

Key facts

TCAP is an unsupervised defense for backdoor detection in MLLM fine-tuning.
Fine-Tuning-as-a-Service (FTaaS) poses backdoor risks from poisoned data.
Backdoor samples cause attention allocation divergence across three components.
The three components are system instructions, vision inputs, and user textual queries.
TCAP decomposes cross-modal attention maps into these three components.
Gaussian Mixture Model (GMM) profiling identifies trigger-responsive attention heads.
EM-based vote aggregation isolates poisoned samples.
The method generalizes across diverse trigger types and modalities.

Entities

—

Sources

arXiv cs.AI — 2026-05-25