EΔ-MHC-Geo Transformer: Adaptive Geodesic Operations with Guaranteed Orthogonality
Researchers have unveiled a groundbreaking model called the EΔ-MHC-Geo Transformer, which innovatively merges Manifold-Constrained Hyper-Connections, Deep Delta Learning, and the Cayley transform. This novel architecture features orthogonal residual connections adaptable to various data inputs. While Deep Delta Learning relies on the Householder operator to sustain orthogonality at specific values, the integrated Data-Dependent Cayley rotation ensures consistent orthogonality across all inputs and parameters. The model employs a learned operator-selection gate to address negation problems. With approximately 1.79 million parameters, tests indicate a notable enhancement in performance when compared to existing systems.
Key facts
- The EΔ-MHC-Geo Transformer unifies mHC, DDL, and Cayley transform.
- Data-Dependent Cayley rotation preserves orthogonality for all β and inputs.
- DDL's Householder operator is orthogonal only at β ∈ {0,2}.
- EΔ-MHC-Geo Hybrid handles eigenvalue -1 case via learned gate.
- Midpoint-collapse regularizer encourages boundary gate decisions.
- Matched-parameter comparisons use approximately 1.79M parameters per model.
- The architecture is presented on arXiv with ID 2605.06729.
- The method is input-adaptive and unconditionally orthogonal.
Entities
Institutions
- arXiv