Recency-Biased Attention Improves Time-Series Forecasting
Researchers propose a recency-biased causal attention mechanism for Transformers to improve time-series forecasting. Standard Transformer attention treats all time steps equally, ignoring the causal and local structure of temporal data. The new method reweights attention scores with a smooth heavy-tailed decay, strengthening local dependencies while maintaining flexibility for longer-range correlations. This aligns Transformer more closely with RNN operations like read, ignore, and write. Experiments show competitive or superior performance on challenging forecasting benchmarks.
Key facts
- Recency bias is a useful inductive prior for sequential modeling.
- Standard Transformer attention lacks recency bias due to all-to-all interactions.
- The proposed mechanism reweights attention scores with a smooth heavy-tailed decay.
- The adjustment strengthens local temporal dependencies.
- It aligns Transformer with RNN read, ignore, and write operations.
- The approach achieves competitive or superior performance on time-series forecasting benchmarks.
- The paper is from Computer Science > Machine Learning on arXiv.
- The arXiv ID is 2502.06151.
Entities
Institutions
- arXiv