Digit Entropy Loss Improves LLM Number Prediction

ai-technology · 2026-05-22

A new method called Digit Entropy Loss (DEL) is proposed to improve numerical learning in large language models (LLMs). Number prediction is crucial for mathematical problem-solving and code generation, but standard maximum likelihood estimation (MLE) is not tailored for numbers. Existing penalty-driven approaches like Number Token Loss and Discretized Distance Loss introduce inductive bias but cause over-sharpened or over-flattened digit distributions. DEL reformulates unsupervised entropy optimization with three key designs, leveraging digit-level information to enhance auto-regressive numerical learning. The paper provides an in-depth analysis of LLM numerical learning, showing that current methods follow a criterion-distance formulation. DEL aims to balance optimization and geometric priors for better number prediction.

Key facts

DEL stands for Digit Entropy Loss
Paper is on arXiv with ID 2605.20369
Number prediction is fundamental for LLMs in math and code
MLE is not tailored for number prediction
Number Token Loss and Discretized Distance Loss are existing methods
Existing methods cause over-sharpened or over-flattened digit distributions
DEL reformulates unsupervised entropy optimization
DEL uses three key designs for auto-regressive learning

Digit Entropy Loss Improves LLM Number Prediction

Key facts

Entities

Institutions

Sources