Transformers' Depth Limits in State Tracking
A recent study published on arXiv suggests that, although transformer architectures excel in sequence modeling, they face a fundamental challenge in dynamic state tracking due to their exclusively feedforward nature. Effective state tracking necessitates the iterative updating of latent variables, which feedforward networks manage by deepening representations with each input step. This process renders information in shallow layers inaccessible and ultimately depletes the model's depth. Although dynamic depth models and external state representations can overcome this limitation, they are often inefficient in terms of computation and memory. The authors advocate for a shift in focus from explicit thought processes to implicit activation dynamics through the use of recurrent architectures.
Key facts
- Transformers encode structure via expanding contextual history.
- Feedforward architecture limits dynamic state tracking.
- State tracking involves sequential dependencies that feedforward networks struggle with.
- Feedforward models push state representations deeper into layers, exhausting depth.
- Dynamic depth models and externalized state representations can bypass depth limits.
- These solutions are computationally and memory inefficient.
- Paper argues for refocusing to implicit activation dynamics via recurrence.
- Published on arXiv with ID 2604.17121.
Entities
Institutions
- arXiv