Entropic Deviation: Measuring Non-Randomness in LLM Token Distributions
A new paper introduces Entropic Deviation (ED) to quantify non-randomness in language model token distributions. Analyzing 31,200 generations across seven models, two architectures, nine prompt categories, three temperatures, and five languages, the study finds that under neutral prompts, transformers exhibit ED of ~0.30, with 88-93% of non-randomness being intrinsic to weights. Three transformer families (Gemma, Llama, Qwen) show nearly identical ED values. The state space model Mamba2 displays twice the ED, lower variance, and high sensitivity to temperature.
Key facts
- Paper introduces Entropic Deviation (ED) as a measure of non-randomness in token distributions.
- Analyzed 31,200 generations across seven models.
- Two architectures studied: transformer and state space.
- Nine prompt categories, three temperatures, five languages.
- Under neutral prompts, transformers show ED ~0.30.
- 88-93% of non-randomness is intrinsic to learned weights.
- Gemma, Llama, Qwen converge on nearly identical ED values.
- Mamba2 shows twice the ED and three times lower within-sequence variance.
Entities
—