Semantic Information Theory for LLMs: From BIT to TOKEN
A new theoretical framework proposes a semantic information theory for large language models, shifting from the classical BIT to the TOKEN as the fundamental unit of meaning. The theory synthesizes statistical physics, continuous signal processing, and classical information theory to provide a rigorous foundation for understanding LLMs, moving beyond heuristic and experimental approaches. The work aims to dismantle the epistemological black box of LLMs by establishing first principles.
Key facts
- The paper is titled 'Forget BIT, It is All about TOKEN: Towards Semantic Information Theory for LLMs'
- It is published on arXiv with ID 2511.01202
- The theory proposes a paradigm shift from BIT to TOKEN as the atomic carrier of meaning
- It synthesizes statistical physics, continuous signal processing, and classical information theory
- The work aims to provide a rigorous theoretical elucidation of LLMs
- Current LLM research is described as heuristic and experimentally driven
- The framework is intended to dismantle the epistemological black box of LLMs
Entities
Institutions
- arXiv