HiSPA Attack Exposes Vulnerability in Mamba Language Models
A new study introduces Hidden State Poisoning Attacks (HiSPA), revealing that state space models (SSMs) like Mamba are vulnerable to adversarial attacks that induce partial amnesia by overwriting hidden states with short input phrases. The researchers developed RoBench-25, a benchmark to evaluate information retrieval under HiSPA, confirming SSM susceptibility. Even the hybrid Jamba-1.7-Mini (52B parameters) collapses on RoBench-25 under certain triggers, while pure Transformers remain unaffected. HiSPA triggers also weaken Jamba on the Open-Prompt-Injections benchmark. The paper highlights critical robustness gaps in SSMs compared to Transformers.
Key facts
- HiSPA attacks induce partial amnesia in SSMs by overwriting hidden states
- RoBench-25 benchmark evaluates model vulnerability to HiSPA
- Jamba-1.7-Mini (52B hybrid) collapses under HiSPA triggers
- Pure Transformers are not affected by HiSPA
- HiSPA weakens Jamba on Open-Prompt-Injections benchmark
- SSMs like Mamba have linear time complexity but lack adversarial robustness
- Study published on arXiv (2601.01972v4)
- Research explores theoretical aspects of the attack
Entities
Institutions
- arXiv