Shannon Scaling Law: A New Framework for LLM Capacity and Degradation

publication · 2026-05-25

Researchers have introduced the Shannon Scaling Law, a conceptual framework that interprets the training of Large Language Models (LLMs) as information transfer through a noisy channel, based on the Shannon-Hartley theorem. This viewpoint elucidates non-linear behaviors such as catastrophic overtraining and degradation from quantization, where performance declines even with greater computational power. By correlating model parameters with channel bandwidth and training tokens with signal strength, the framework illustrates the interplay between the learning signal and inherent noise, uncovering a basic Shannon capacity for LLMs. Increasing model size or data without maintaining an adequate signal-to-noise ratio (SNR) intensifies noise, leading to a shift from consistent improvement to a U-shaped decline in performance. The paper includes experimental validation of this theory.

Key facts

Existing scaling laws for LLMs are predominantly monotonic power laws.
Non-monotonic phenomena include catastrophic overtraining and quantization-induced degradation.
The Shannon Scaling Law is grounded in the Shannon-Hartley theorem.
Model parameters are mapped to channel bandwidth.
Training tokens are mapped to signal power.
A fundamental Shannon capacity exists for LLMs.
Insufficient SNR leads to U-shaped performance degradation.
Experiments validate the proposed theory.

Shannon Scaling Law: A New Framework for LLM Capacity and Degradation

Key facts

Entities

Institutions

Sources