FLUID: Continuous-Time Transformer with Liquid Attention
A novel transformer framework named FLUID (Flexible Unified Information Dynamics) substitutes the conventional scaled-dot-product attention with a Liquid Attention Network (LAN), which treats attention logits as a continuous dynamical system. The LAN redefines attention as the outcome of a linear ordinary differential equation influenced by nonlinear recurrent gates that depend on input. Theoretical evaluations confirm the stability of LAN dynamics and demonstrate its ability to bridge discrete attention and continuous-time RNNs, with each being a specific instance. Additionally, FLUID incorporates a dedicated attention-sink gate to prevent excessive emphasis on sink tokens. This research is available on arXiv under ID 2605.04421.
Key facts
- FLUID replaces scaled-dot-product attention with Liquid Attention Network (LAN).
- LAN models attention logits as a continuous dynamical system.
- Attention is reformulated as the solution to a linear ODE with input-dependent gates.
- Stability guarantees are established for LAN dynamics.
- LAN interpolates between SDPA and CT-RNNs, recovering each as a special case.
- An explicit attention-sink gate is introduced to eliminate disproportionate focus.
- The paper is available on arXiv with ID 2605.04421.
- The approach targets continuous-time modeling for irregular and long-range sequences.
Entities
Institutions
- arXiv