D²-Monitor: Dynamic Safety Monitoring for Diffusion LLMs via Hesitation-Aware Routing
A new safety monitoring method, D²-Monitor, has been proposed for diffusion large language models (D-LLMs), which generate text through a multi-step denoising process. Unlike autoregressive LLMs, D-LLMs expose intermediate hidden representations that may contain safety-relevant information. The researchers identify 'safety hesitation'—intermediate hidden states repeatedly falling near the probe's decision boundary—as a key signal predicting probe failure. D²-Monitor uses a bi-level routing strategy to dynamically allocate monitoring resources based on this hesitation signal. The work is published on arXiv (paper 2605.25893).
Key facts
- D²-Monitor is a dynamic safety monitoring method for diffusion LLMs.
- Diffusion LLMs generate text via multi-step denoising, exposing intermediate hidden states.
- Safety hesitation is defined as hidden states repeatedly near the probe's decision boundary.
- The number of hesitation steps predicts probe failure effectively.
- D²-Monitor uses bi-level routing for resource allocation.
- The paper is available on arXiv with ID 2605.25893.
- The method is motivated by lightweight probes for always-on monitoring.
- The research addresses a gap in safety monitoring for D-LLMs.
Entities
Institutions
- arXiv