Mapping Failure Manifolds in Large Language Models

other · 2026-05-07

A new framework systematically maps the 'Manifold of Failure' in LLMs, treating vulnerability search as a quality diversity problem. Using MAP-Elites, researchers identify behavioral attraction basins and measure alignment deviation. Tested on Llama-3-8B, GPT-OSS-20B, and GPT-5-Mini, the method achieves up to 63% behavioral coverage and discovers up to 370 distinct vulnerability niches, revealing model-specific topological signatures.

Key facts

Framework maps the Manifold of Failure in LLMs
Reframes vulnerability search as a quality diversity problem using MAP-Elites
Introduces Alignment Deviation as quality metric
Tested on Llama-3-8B, GPT-OSS-20B, and GPT-5-Mini
Achieves up to 63% behavioral coverage
Discovers up to 370 distinct vulnerability niches
Reveals model-specific topological signatures
Published on arXiv (2602.22291)

Mapping Failure Manifolds in Large Language Models

Key facts

Entities

Institutions

Sources