LogiBreak: Logical Expressions Bypass LLM Safety Restrictions

ai-technology · 2026-04-25

Researchers have introduced LogiBreak, a novel black-box jailbreak method that converts harmful natural language prompts into formal logical expressions to circumvent safety systems in large language models (LLMs). The method exploits distributional discrepancies between alignment-oriented prompts and logic-based inputs, preserving semantic intent and readability while evading safety constraints. LogiBreak was evaluated on a multilingual jailbreak dataset spanning three languages, demonstrating effectiveness across various evaluation settings and linguistic contexts. The research is published on arXiv under computer science category.

Key facts

LogiBreak is a black-box jailbreak method for LLMs.
It converts harmful prompts into formal logical expressions.
It exploits distributional gaps between alignment data and logic-based inputs.
Preserves semantic intent and readability.
Evaluated on a multilingual jailbreak dataset across three languages.
Demonstrates effectiveness in various evaluation settings.
Published on arXiv (2505.13527).
Research is in computer science (Computation and Language).

LogiBreak: Logical Expressions Bypass LLM Safety Restrictions

Key facts

Entities

Institutions

Sources