AE-CoT: Adaptive Evolutionary Jailbreak for LLMs
A new research paper on arXiv (2605.24497) proposes AE-CoT, an adaptive evolutionary chain-of-thought jailbreak framework targeting Large Reasoning Models (LRMs). The method rewrites harmful goals into mild prompts using teacher role-play, decomposes them into coherent reasoning fragments, and performs multi-generation evolutionary search to expand candidate diversity. This addresses the vulnerability of explicit CoT mechanisms in LRMs, which static jailbreak templates fail to exploit effectively due to limited diversity and adaptability.
Key facts
- arXiv:2605.24497
- AE-CoT framework
- targets Large Reasoning Models (LRMs)
- uses adaptive evolutionary chain-of-thought jailbreak
- rewrites harmful goals into mild prompts with teacher role-play
- decomposes prompts into reasoning fragments
- multi-generation evolutionary search
- addresses limitations of static CoT jailbreak templates
Entities
Institutions
- arXiv