Recency-Biased Attention Improves Time-Series Forecasting

other · 2026-04-24

Researchers propose a recency-biased causal attention mechanism for Transformers to improve time-series forecasting. Standard Transformer attention treats all time steps equally, ignoring the causal and local structure of temporal data. The new method reweights attention scores with a smooth heavy-tailed decay, strengthening local dependencies while maintaining flexibility for longer-range correlations. This aligns Transformer more closely with RNN operations like read, ignore, and write. Experiments show competitive or superior performance on challenging forecasting benchmarks.

Key facts

Recency bias is a useful inductive prior for sequential modeling.
Standard Transformer attention lacks recency bias due to all-to-all interactions.
The proposed mechanism reweights attention scores with a smooth heavy-tailed decay.
The adjustment strengthens local temporal dependencies.
It aligns Transformer with RNN read, ignore, and write operations.
The approach achieves competitive or superior performance on time-series forecasting benchmarks.
The paper is from Computer Science > Machine Learning on arXiv.
The arXiv ID is 2502.06151.

Recency-Biased Attention Improves Time-Series Forecasting

Key facts

Entities

Institutions

Sources