Stealth Pretraining Seeding Attack Plants Logic Landmines in LLMs
A new adversarial attack family called Stealth Pretraining Seeding (SPS) exploits the web-scale pretraining process of large language models (LLMs) by distributing small, benign-looking poisoned content across stealth websites. The content is exposed to web crawlers via robots.txt, increasing its absorption into training corpora like Common Crawl. Each payload is tiny and diffuse, evading detection during dataset filtering. The attack embeds dormant logic landmines that remain invisible under standard evaluation but can be activated by precise alphanumeric triggers (e.g., <00TRIGGER00>) to bypass safeguards. The paper, titled "PermaFrost-Attack: Stealth Pretraining Seeding(SPS) for planting Logic Landmines During LLM Training," is available on arXiv (2604.22117).
Key facts
- Attack family called Stealth Pretraining Seeding (SPS)
- Poisoned content distributed across stealth websites
- Content exposed to web crawlers via robots.txt
- Targets training corpora like Common Crawl
- Payloads are tiny, diffuse, and superficially benign
- Difficult to detect during dataset construction or filtering
- Embeds dormant logic landmines in LLMs
- Activated by precise alphanumeric triggers such as <00TRIGGER00>
- Paper published on arXiv with ID 2604.22117
Entities
Institutions
- arXiv
- Common Crawl