SPOT: Inference-Time Safety for Text-to-Image Generation via Prompt Projection
Researchers propose SPOT (Selective Prompt Projection), an inference-time framework for safe text-to-image generation. The method addresses the tension between suppressing unsafe outputs and preserving benign behavior in frozen diffusion models. It formalizes the Safety-Prompt Alignment Tradeoff (SPAT), where reducing expected unsafety requires deviation from the prompt-conditioned distribution. SPOT defines a tau-safe set of prompts with reference risk at most tau, and intervenes by projecting prompts toward this set. The approach uses total variation (TV) to bound risk changes. The paper is available on arXiv with ID 2602.00616.
Key facts
- SPOT is an inference-time framework for safe text-to-image generation.
- It uses total variation to bound expected risk changes.
- The Safety-Prompt Alignment Tradeoff (SPAT) is introduced.
- A tau-safe set of prompts is defined based on reference risk.
- Intervention is cast as projection toward nearby prompts in the safe set.
- The method works with frozen diffusion models.
- The paper is available on arXiv (ID 2602.00616).
- The approach is designed for selective and adjustable safety.
Entities
Institutions
- arXiv