ARTFEED — Contemporary Art Intelligence

Mapping Failure Manifolds in Large Language Models

other · 2026-05-07

A new framework systematically maps the 'Manifold of Failure' in LLMs, treating vulnerability search as a quality diversity problem. Using MAP-Elites, researchers identify behavioral attraction basins and measure alignment deviation. Tested on Llama-3-8B, GPT-OSS-20B, and GPT-5-Mini, the method achieves up to 63% behavioral coverage and discovers up to 370 distinct vulnerability niches, revealing model-specific topological signatures.

Key facts

  • Framework maps the Manifold of Failure in LLMs
  • Reframes vulnerability search as a quality diversity problem using MAP-Elites
  • Introduces Alignment Deviation as quality metric
  • Tested on Llama-3-8B, GPT-OSS-20B, and GPT-5-Mini
  • Achieves up to 63% behavioral coverage
  • Discovers up to 370 distinct vulnerability niches
  • Reveals model-specific topological signatures
  • Published on arXiv (2602.22291)

Entities

Institutions

  • arXiv

Sources