TIDE: Novel Transformer Architecture with EmbeddingMemory
A recent research article introduces TIDE, an augmentation for transformers that tackles two key structural issues in large language models: the Rare Token Problem and the Contextual Collapse Problem. The Rare Token Problem is linked to a Zipf-distributed vocabulary, leading to inadequate gradient signals for infrequent tokens. Meanwhile, the Contextual Collapse Problem arises when a limited number of parameters cause similar tokens to be represented by indistinguishable hidden states. TIDE features EmbeddingMemory, which consists of K independent MemoryBlocks that convert token indices into context-free semantic vectors. These vectors are integrated into each layer through a depth-conditioned softmax router equipped with a learnable null bank. The research can be found on arXiv with ID 2605.06216.
Key facts
- TIDE augments standard transformers with EmbeddingMemory
- Addresses the Rare Token Problem and Contextual Collapse Problem
- Uses K independent MemoryBlocks for context-free semantic vectors
- Injects vectors into every layer via depth-conditioned softmax router
- Includes a learnable null bank
- Paper ID: arXiv:2605.06216
Entities
Institutions
- arXiv