PEO: Direct Embedding Optimization for LLM Jailbreaking

ai-technology · 2026-04-30

Researchers propose Prompt Embedding Optimization (PEO), a multi-round white-box jailbreak method that directly optimizes the embeddings of original prompt tokens without appending adversarial tokens. Contrary to prior concerns, the optimized embeddings remain close to originals, preserving the visible prompt string after nearest-token projection. Quantitative analysis shows model responses stay on topic for most prompts. PEO combines continuous embedding-space optimization with structured continuation targets.

Key facts

PEO is a multi-round white-box jailbreak method.
It directly optimizes embeddings of original prompt tokens.
No adversarial tokens are appended.
Optimized embeddings remain close to originals.
Visible prompt string is preserved after nearest-token projection.
Model responses stay on topic for the large majority of prompts.
PEO combines continuous embedding-space optimization with structured continuation targets.
The paper is available on arXiv (2604.24983).

PEO: Direct Embedding Optimization for LLM Jailbreaking

Key facts

Entities

Institutions

Sources