ARTFEED — Contemporary Art Intelligence

LogiBreak: Logical Expressions Bypass LLM Safety Restrictions

ai-technology · 2026-04-25

Researchers have introduced LogiBreak, a novel black-box jailbreak method that converts harmful natural language prompts into formal logical expressions to circumvent safety systems in large language models (LLMs). The method exploits distributional discrepancies between alignment-oriented prompts and logic-based inputs, preserving semantic intent and readability while evading safety constraints. LogiBreak was evaluated on a multilingual jailbreak dataset spanning three languages, demonstrating effectiveness across various evaluation settings and linguistic contexts. The research is published on arXiv under computer science category.

Key facts

  • LogiBreak is a black-box jailbreak method for LLMs.
  • It converts harmful prompts into formal logical expressions.
  • It exploits distributional gaps between alignment data and logic-based inputs.
  • Preserves semantic intent and readability.
  • Evaluated on a multilingual jailbreak dataset across three languages.
  • Demonstrates effectiveness in various evaluation settings.
  • Published on arXiv (2505.13527).
  • Research is in computer science (Computation and Language).

Entities

Institutions

  • arXiv

Sources