ARTFEED — Contemporary Art Intelligence

Mathematical Encoding Bypasses LLM Safety Filters with 56% Success

ai-technology · 2026-05-07

A new study reveals that encoding harmful prompts as mathematical problems—using set theory, formal logic, and quantum mechanics—bypasses LLM safety filters with 46%–56% average attack success across eight models. The key factor is deep reformulation into genuine mathematical problems, not mere formatting. The research introduces a Formal Logic encoding achieving comparable success to Set Theory, showing the vulnerability generalizes across formalisms.

Key facts

  • Harmful prompts encoded as mathematical problems bypass LLM safety filters at 46%–56% average attack success.
  • Eight target models and two benchmarks were tested.
  • Effectiveness depends on deep reformulation into genuine mathematical problems, not just mathematical notation.
  • Rule-based encodings without reformulation perform no better than unencoded baselines.
  • A novel Formal Logic encoding achieves attack success comparable to Set Theory.
  • The vulnerability generalizes across mathematical formalisms including set theory, formal logic, and quantum mechanics.
  • The study is published on arXiv with ID 2605.03441.
  • LLM safety mechanisms primarily rely on semantic pattern matching, which this attack exploits.

Entities

Institutions

  • arXiv

Sources