Entropy-Aware Masking Improves Masked Language Model Training

other · 2026-05-28

A new study on arXiv proposes an entropy-aware token masking strategy for masked language modeling (MLM), a standard pretraining objective for encoder-based language models. Instead of random masking, the method selects tokens based on the model's entropy over token predictions, targeting more informative and uncertain tokens to improve training efficacy. The authors also introduce a self-masking approach that enhances training efficiency without requiring an external reference model. Experimental results show average performance improvements over conventional random masking. The paper is available under arXiv ID 2605.28526.

Key facts

arXiv paper ID: 2605.28526
Proposes entropy-aware masking for MLM
Uses model's entropy over token predictions to select tokens
Aims to target more informative and uncertain tokens
Introduces a novel self-masking approach
Self-masking does not rely on an external reference model
Experimental results show average performance improvement
Published on arXiv

Entropy-Aware Masking Improves Masked Language Model Training

Key facts

Entities

Institutions

Sources