HARP: Measuring Harm Amplification in Multi-Agent LLM Systems
A novel approach known as HARP (Harm Amplification through Role Perturbation) has been developed to examine how small errors in multi-agent LLM systems can lead to extensive system-level damage. This method analyzes both clean and altered executions, capturing various outputs, tool interactions, memory activities, guard events, oracle logs, latency, token expenses, and decision-making processes. It characterizes local harm as deviations from intended agents or corrupted channels, while global harm refers to deviations across the entire trace. Harm amplification is defined as the ratio of global harm to local harm. This metric enhances the understanding of attack success rates by illustrating how damage propagates beyond the initial attack site. The findings are published in arXiv:2605.27489.
Key facts
- HARP stands for Harm Amplification through Role Perturbation.
- It is a trace-first methodology for multi-agent LLM systems.
- It compares paired clean and perturbed executions.
- It records specialist outputs, tool calls, memory reads/writes, guard events, oracle logs, latency, token cost, and decisions.
- Local harm is deviation from targeted agents or corrupted channels.
- Global harm is deviation over the full trace.
- Harm amplification is defined as H_global/H_local.
- The methodology complements attack success rate.
Entities
Institutions
- arXiv