Mathematical Encoding Bypasses LLM Safety Filters with 56% Success

ai-technology · 2026-05-07

A new study reveals that encoding harmful prompts as mathematical problems—using set theory, formal logic, and quantum mechanics—bypasses LLM safety filters with 46%–56% average attack success across eight models. The key factor is deep reformulation into genuine mathematical problems, not mere formatting. The research introduces a Formal Logic encoding achieving comparable success to Set Theory, showing the vulnerability generalizes across formalisms.

Key facts

Harmful prompts encoded as mathematical problems bypass LLM safety filters at 46%–56% average attack success.
Eight target models and two benchmarks were tested.
Effectiveness depends on deep reformulation into genuine mathematical problems, not just mathematical notation.
Rule-based encodings without reformulation perform no better than unencoded baselines.
A novel Formal Logic encoding achieves attack success comparable to Set Theory.
The vulnerability generalizes across mathematical formalisms including set theory, formal logic, and quantum mechanics.
The study is published on arXiv with ID 2605.03441.
LLM safety mechanisms primarily rely on semantic pattern matching, which this attack exploits.

Mathematical Encoding Bypasses LLM Safety Filters with 56% Success

Key facts

Entities

Institutions

Sources