ARTFEED — Contemporary Art Intelligence

Recency-Biased Attention Improves Time-Series Forecasting

other · 2026-04-24

Researchers propose a recency-biased causal attention mechanism for Transformers to improve time-series forecasting. Standard Transformer attention treats all time steps equally, ignoring the causal and local structure of temporal data. The new method reweights attention scores with a smooth heavy-tailed decay, strengthening local dependencies while maintaining flexibility for longer-range correlations. This aligns Transformer more closely with RNN operations like read, ignore, and write. Experiments show competitive or superior performance on challenging forecasting benchmarks.

Key facts

  • Recency bias is a useful inductive prior for sequential modeling.
  • Standard Transformer attention lacks recency bias due to all-to-all interactions.
  • The proposed mechanism reweights attention scores with a smooth heavy-tailed decay.
  • The adjustment strengthens local temporal dependencies.
  • It aligns Transformer with RNN read, ignore, and write operations.
  • The approach achieves competitive or superior performance on time-series forecasting benchmarks.
  • The paper is from Computer Science > Machine Learning on arXiv.
  • The arXiv ID is 2502.06151.

Entities

Institutions

  • arXiv

Sources