Safety-Aware Denoiser for Text Diffusion Models

ai-technology · 2026-05-12

A new framework called Safety-Aware Denoiser (SAD) addresses safety risks in text diffusion models, which are an alternative to autoregressive generation. Existing safety methods rely on post-hoc filtering or inference-time interventions designed for autoregressive models, proving inadequate for diffusion models. SAD modifies the iterative denoising process to steer final text samples toward provably safe regions, integrating safety constraints without retraining. It evaluates safety using hazard taxonomy and memorization metrics.

Key facts

SAD is a safety-guidance framework for text diffusion models.
It modifies the denoising process to ensure safe text generation.
Existing safety approaches are designed for autoregressive models.
SAD avoids computationally expensive retraining.
It uses inference-time safety constraints.
Safety evaluation includes hazard taxonomy and memorization.
The method is lightweight and flexible.
It steers samples toward provably safe text regions.

Entities

—

Sources

arXiv cs.AI — 2026-05-12