ARTFEED — Contemporary Art Intelligence

DACA-GRPO Enhances Reinforcement Learning for Diffusion Language Models

ai-technology · 2026-05-20

A new paper on arXiv (2605.16342) proposes DACA-GRPO (Denoising-Aware Credit Assignment for GRPO), a method to improve reinforcement learning in diffusion large language models. The authors identify two weaknesses in existing RL approaches: lack of temporal credit assignment across denoising steps and biased mean-field likelihood estimates. DACA-GRPO introduces Denoising Progress Scores for per-token importance weights and Stratified Masking Likelihood to reduce bias. It is designed as a plug-and-play enhancement for GRPO-style trainers.

Key facts

  • arXiv paper 2605.16342 introduces DACA-GRPO
  • DACA-GRPO addresses temporal credit assignment in diffusion LLMs
  • Denoising Progress Scores extract per-token importance weights
  • Stratified Masking Likelihood partitions token positions into strata
  • Method is a plug-and-play enhancement for GRPO-style trainers
  • Existing RL methods treat all denoising steps as equally important
  • Mean-field likelihood estimates are systematically biased
  • DACA-GRPO requires no additional forward cost

Entities

Institutions

  • arXiv

Sources