Digit Entropy Loss Improves LLM Number Prediction
A new method called Digit Entropy Loss (DEL) is proposed to improve numerical learning in large language models (LLMs). Number prediction is crucial for mathematical problem-solving and code generation, but standard maximum likelihood estimation (MLE) is not tailored for numbers. Existing penalty-driven approaches like Number Token Loss and Discretized Distance Loss introduce inductive bias but cause over-sharpened or over-flattened digit distributions. DEL reformulates unsupervised entropy optimization with three key designs, leveraging digit-level information to enhance auto-regressive numerical learning. The paper provides an in-depth analysis of LLM numerical learning, showing that current methods follow a criterion-distance formulation. DEL aims to balance optimization and geometric priors for better number prediction.
Key facts
- DEL stands for Digit Entropy Loss
- Paper is on arXiv with ID 2605.20369
- Number prediction is fundamental for LLMs in math and code
- MLE is not tailored for number prediction
- Number Token Loss and Discretized Distance Loss are existing methods
- Existing methods cause over-sharpened or over-flattened digit distributions
- DEL reformulates unsupervised entropy optimization
- DEL uses three key designs for auto-regressive learning
Entities
Institutions
- arXiv