Weierstrass Elliptic Positional Encoding Improves Vision Transformers
A recent preprint on arXiv (2605.23719) presents Weierstrass elliptic Positional Encoding (WePE) designed for Vision Transformers (ViTs). Traditional ViTs rely on learnable one-dimensional positional encodings, which do not effectively maintain the two-dimensional spatial arrangement of images once patch flattening occurs. WePE resolves this issue by projecting normalized 2D patch coordinates onto the complex plane and creating compact four-dimensional positional features based on the Weierstrass elliptic function and its derivative. The function's double periodicity offers a systematic representation of 2D positions, ensuring a consistent relationship between Euclidean spatial distances and sequential index distances. This mathematically sound approach seeks to improve ViTs' capacity to utilize spatial proximity priors, a feature often lacking in current encodings due to inadequate geometric constraints.
Key facts
- arXiv preprint 2605.23719 proposes Weierstrass elliptic Positional Encoding (WePE) for Vision Transformers.
- Current ViTs use learnable one-dimensional positional encodings that weaken 2D spatial structure.
- WePE maps normalized 2D patch coordinates onto the complex plane.
- WePE constructs four-dimensional positional features using the Weierstrass elliptic function and its derivative.
- Double periodicity provides a principled representation of 2D positions.
- WePE maintains monotonic relationship between Euclidean distances and sequential index distances.
- Existing positional encodings lack geometric constraints and spatial proximity priors.
- The method is mathematically grounded and motivated by periodicity in positional encoding.
Entities
Institutions
- arXiv