UAT-MC: New Defense Against Evasion-Based Promotion Attacks in Multimodal Recommender Systems

ai-technology · 2026-05-09

In a recent study, researchers discovered a cross-modal gradient mismatch in multimodal recommender systems during multi-user promotion scenarios. They found that visual and textual perturbations are optimized in conflicting directions because of prevailing user groups, which weakens the effectiveness of attacks and leads to an underestimation of worst-case risks during robust training. To tackle this issue, they introduce Untargeted Adversarial Training with Multimodal Coordination (UAT-MC), which considers every item as a possible target to defend against evasion-based attacks. The findings are documented in arXiv:2605.06238.

Key facts

arXiv:2605.06238
Multimodal recommender systems use visual and textual signals
Cross-modal gradient mismatch occurs under multi-user promotion setting
Visual and textual perturbations optimized in inconsistent directions
Dominance of distinct user groups causes mismatch
UAT-MC treats all items as potential targets
Evasion-based attacks are underexplored compared to poisoning-based
Existing defenses are limited to single-modal settings

Entities

—

Sources

arXiv cs.AI — 2026-05-09