ARTFEED — Contemporary Art Intelligence

EvoJail: Evolutionary Framework for Diverse LLM Jailbreak Prompts

ai-technology · 2026-05-07

A team of researchers has introduced EvoJail, a framework designed for generating evolutionary jailbreaks driven by instruction fusion for large language models (LLMs). This framework defines the creation of jailbreak prompts as a multi-objective black-box optimization challenge, employing evolutionary algorithms to discover prompts that can adapt to various model versions and display a range of attack strategies. By incorporating prompt generation into a cyclical evolutionary process, EvoJail overcomes the shortcomings of current methods regarding adaptability to changing safety-finetuned models and the variety of prompts produced. This research is available on arXiv under preprint number 2605.02921.

Key facts

  • EvoJail is an instruction-fusion-driven evolutionary jailbreak generation framework.
  • It formalizes jailbreak prompt generation as a multi-objective black-box optimization problem.
  • It uses evolutionary algorithms to search for adaptable and diverse jailbreak prompts.
  • The framework addresses adaptability to evolving safety-finetuned models.
  • It also addresses diversity in generated prompts to avoid narrow attack patterns.
  • The work is published on arXiv as preprint 2605.02921.
  • The paper is categorized under 'cross' announcement type.
  • The framework integrates jailbreak prompt generation into an iterative evolutionary loop.

Entities

Institutions

  • arXiv

Sources