UAT-MC: New Defense Against Evasion-Based Promotion Attacks in Multimodal Recommender Systems
In a recent study, researchers discovered a cross-modal gradient mismatch in multimodal recommender systems during multi-user promotion scenarios. They found that visual and textual perturbations are optimized in conflicting directions because of prevailing user groups, which weakens the effectiveness of attacks and leads to an underestimation of worst-case risks during robust training. To tackle this issue, they introduce Untargeted Adversarial Training with Multimodal Coordination (UAT-MC), which considers every item as a possible target to defend against evasion-based attacks. The findings are documented in arXiv:2605.06238.
Key facts
- arXiv:2605.06238
- Multimodal recommender systems use visual and textual signals
- Cross-modal gradient mismatch occurs under multi-user promotion setting
- Visual and textual perturbations optimized in inconsistent directions
- Dominance of distinct user groups causes mismatch
- UAT-MC treats all items as potential targets
- Evasion-based attacks are underexplored compared to poisoning-based
- Existing defenses are limited to single-modal settings
Entities
—