Entropy-Aware Masking Improves Masked Language Model Training
A new study on arXiv proposes an entropy-aware token masking strategy for masked language modeling (MLM), a standard pretraining objective for encoder-based language models. Instead of random masking, the method selects tokens based on the model's entropy over token predictions, targeting more informative and uncertain tokens to improve training efficacy. The authors also introduce a self-masking approach that enhances training efficiency without requiring an external reference model. Experimental results show average performance improvements over conventional random masking. The paper is available under arXiv ID 2605.28526.
Key facts
- arXiv paper ID: 2605.28526
- Proposes entropy-aware masking for MLM
- Uses model's entropy over token predictions to select tokens
- Aims to target more informative and uncertain tokens
- Introduces a novel self-masking approach
- Self-masking does not rely on an external reference model
- Experimental results show average performance improvement
- Published on arXiv
Entities
Institutions
- arXiv