DACA-GRPO Enhances Reinforcement Learning for Diffusion Language Models

ai-technology · 2026-05-20

A new paper on arXiv (2605.16342) proposes DACA-GRPO (Denoising-Aware Credit Assignment for GRPO), a method to improve reinforcement learning in diffusion large language models. The authors identify two weaknesses in existing RL approaches: lack of temporal credit assignment across denoising steps and biased mean-field likelihood estimates. DACA-GRPO introduces Denoising Progress Scores for per-token importance weights and Stratified Masking Likelihood to reduce bias. It is designed as a plug-and-play enhancement for GRPO-style trainers.

Key facts

arXiv paper 2605.16342 introduces DACA-GRPO
DACA-GRPO addresses temporal credit assignment in diffusion LLMs
Denoising Progress Scores extract per-token importance weights
Stratified Masking Likelihood partitions token positions into strata
Method is a plug-and-play enhancement for GRPO-style trainers
Existing RL methods treat all denoising steps as equally important
Mean-field likelihood estimates are systematically biased
DACA-GRPO requires no additional forward cost

DACA-GRPO Enhances Reinforcement Learning for Diffusion Language Models

Key facts

Entities

Institutions

Sources