Attention Dispersion Diagnosed in Dynamic Graph Transformers
A study identifies attention dispersion as a failure mode in dynamic graph Transformers under temporal distribution shift. Researchers show that prediction depends on critical nodes with consistent predictive signal, but existing models fail to focus on them. A transferable fix using differential attention is proposed.
Key facts
- Transformer architectures dominate Continuous-Time Dynamic Graph learning
- Attention dispersion is a shared failure mode under temporal shift
- Critical nodes carry more predictive signal than arbitrary neighbors
- Standard attention produces overly dispersed distributions
- Differential attention suppresses common-mode noise
- Fix is transferable across models
Entities
Institutions
- arXiv