EvoJail: Evolutionary Framework for Diverse LLM Jailbreak Prompts
A team of researchers has introduced EvoJail, a framework designed for generating evolutionary jailbreaks driven by instruction fusion for large language models (LLMs). This framework defines the creation of jailbreak prompts as a multi-objective black-box optimization challenge, employing evolutionary algorithms to discover prompts that can adapt to various model versions and display a range of attack strategies. By incorporating prompt generation into a cyclical evolutionary process, EvoJail overcomes the shortcomings of current methods regarding adaptability to changing safety-finetuned models and the variety of prompts produced. This research is available on arXiv under preprint number 2605.02921.
Key facts
- EvoJail is an instruction-fusion-driven evolutionary jailbreak generation framework.
- It formalizes jailbreak prompt generation as a multi-objective black-box optimization problem.
- It uses evolutionary algorithms to search for adaptable and diverse jailbreak prompts.
- The framework addresses adaptability to evolving safety-finetuned models.
- It also addresses diversity in generated prompts to avoid narrow attack patterns.
- The work is published on arXiv as preprint 2605.02921.
- The paper is categorized under 'cross' announcement type.
- The framework integrates jailbreak prompt generation into an iterative evolutionary loop.
Entities
Institutions
- arXiv