Atom Theory: Defining Fundamental Units of LLMs
A new paper on arXiv introduces Atom Theory to define, evaluate, and identify the fundamental representational units (FRUs) of large language models (LLMs), termed atoms. The authors propose the atomic inner product (AIP), a non-Euclidean metric capturing underlying geometry, and two criteria for ideal atoms: faithfulness (R²) and stability (q*). They prove atoms are identifiable under threshold-activated sparse autoencoders (TSAEs). Empirically, they uncover a pervasive representation shift in LLMs, which AIP corrects. Neurons and features fail as ideal atoms: neurons are faithful (R²=1) but unstable (q*=0.5%).
Key facts
- Atom Theory defines fundamental representational units (FRUs) of LLMs as atoms.
- Atomic inner product (AIP) is a non-Euclidean metric for LLM representations.
- Two criteria for ideal atoms: faithfulness (R²) and stability (q*).
- Atoms are identifiable under threshold-activated sparse autoencoders (TSAEs).
- A pervasive representation shift in LLMs is uncovered.
- AIP corrects the representation shift to capture underlying geometry.
- Neurons are faithful (R²=1) but unstable (q*=0.5%).
- Features also fail to qualify as ideal atoms.
Entities
Institutions
- arXiv