Persistent Homology Reveals Topological Compression in LLMs Under Adversarial Attacks

ai-technology · 2026-04-27

A recent investigation utilizes persistent homology (PH) to explore how adversarial inputs alter the geometry and topology of internal representation spaces within large language models (LLMs). The study examined six models with parameters ranging from 3.8B to 70B, focusing on two specific attacks: indirect prompt injection and backdoor fine-tuning. Researchers identified a recurring topological signature indicating that adversarial inputs lead to topological compression, simplifying the latent space by merging diverse, compact, small-scale features into fewer, more prominent, large-scale ones. This signature is independent of the model architecture. The research fills a gap in interpretability methods, which often miss the complex, high-dimensional relationships in model representations, revealing that adversarial effects possess a detectable topological shape.

Key facts

Study applies persistent homology to LLM latent spaces.
Analyzes six models (3.8B to 70B parameters).
Two attack types: indirect prompt injection and backdoor fine-tuning.
Adversarial inputs cause topological compression.
Compression collapses small-scale features into large-scale ones.
Signature is architecture-agnostic.
Published on arXiv (2505.20435).
Addresses limitations of existing interpretability methods.

Persistent Homology Reveals Topological Compression in LLMs Under Adversarial Attacks

Key facts

Entities

Institutions

Sources