Hierarchical Attack Framework Targets Multi-Modal Multi-Agent Systems

ai-technology · 2026-05-14

A new research paper on arXiv introduces HAM$^{3}$, a Hierarchical Attack framework designed to expose vulnerabilities in multi-modal multi-agent systems (MM-MAS). The study addresses a gap in adversarial attack research, which has largely focused on isolated agents or unimodal settings. HAM$^{3}$ decomposes attacks into three interconnected layers: perception, communication, and reasoning. At the perception layer, it perturbs visual inputs, textual inputs, and their fused representations. At the communication layer, it corrupts message content and interaction topology among agents. The framework aims to systematically evaluate the robustness of MM-MAS, which are increasingly used for complex reasoning and coordination across diverse modalities. The paper is available on arXiv under the identifier 2605.13213.

Key facts

HAM$^{3}$ is a Hierarchical Attack framework for multi-modal multi-agent systems.
The framework attacks three layers: perception, communication, and reasoning.
Perception layer attacks perturb visual, textual, and fused representations.
Communication layer attacks corrupt message content and interaction topology.
Existing studies focus on isolated agents or unimodal settings, not MM-MAS.
The paper is published on arXiv with ID 2605.13213.
MM-MAS enable complex reasoning and coordination across modalities.
The research aims to uncover vulnerabilities in MM-MAS.

Hierarchical Attack Framework Targets Multi-Modal Multi-Agent Systems

Key facts

Entities

Institutions

Sources