Token-to-Mask Remasking Improves Discrete Diffusion Language Models
A new training-free method called Token-to-Mask (T2M) remasking addresses limitations in discrete masked diffusion language models like LLaDA. T2M replaces the Token-to-Token (T2T) editing mechanism introduced in LLaDA2.1, which directly replaces suspected erroneous tokens. T2M resets such tokens back to the mask state, enabling the diffusion process to re-predict them under a cleaner context. The approach decouples error detection from replacement, avoids polluting the generation context, and eliminates the train-inference noise mismatch caused by systematic model-generated errors. The authors design and empirically validate three complementary error detection strategies. The paper is available on arXiv under identifier 2605.26436.
Key facts
- Token-to-Mask (T2M) remasking is a training-free method.
- T2M replaces Token-to-Token (T2T) editing in discrete masked diffusion models.
- T2M resets suspected erroneous tokens to the mask state.
- T2M addresses limitations of T2T editing: coupling error detection with replacement, context pollution, and noise mismatch.
- Three complementary error detection strategies are proposed and validated.
- The paper is available on arXiv: 2605.26436.
- LLaDA is a discrete masked diffusion language model.
- LLaDA2.1 introduced T2T editing.
Entities
Institutions
- arXiv