DiffScore: A New Text Evaluation Method Using Masked Diffusion Language Models
A new research paper presents DiffScore, a framework for text evaluation that employs masked reconstruction alongside large diffusion language models to address the positional bias found in autoregressive models. This approach assesses each token using complete bidirectional context, thereby removing the left-to-right factorization imbalance. It creates a hierarchy for evaluation that ranges from local fluency to global coherence by analyzing text recoverability at various continuous masking rates. Additionally, DiffScore includes diagnostic tools such as multi-timestep quality profiles and bidirectional PMI decomposition, which distinguish fluency from faithfulness. Experiments conducted across ten benchmarks validate its efficacy.
Key facts
- DiffScore uses masked reconstruction instead of autoregressive likelihood
- It eliminates positional bias by using full bidirectional context
- It measures text recoverability across continuous masking rates
- It provides multi-timestep quality profiles and bidirectional PMI decomposition
- Experiments were conducted across ten benchmarks
- The paper is published on arXiv with ID 2605.11601
- The method is built on Masked Large Diffusion Language Models
- It evaluates from local fluency to global coherence
Entities
Institutions
- arXiv