ARTFEED — Contemporary Art Intelligence

TBP-mHC: Full Expressivity for Manifold-Constrained Hyper Connections via Transportation Polytopes

other · 2026-05-23

A novel parameterization technique known as Transportation Birkhoff Polytope (TBP) and its recursive version (RTBP) have been introduced to mitigate training instability in Hyper-Connections (HC) within residual networks. While HC enhances residual networks by allowing learnable mixing across various residual streams, unrestricted mixing leads to instability. Manifold-Constrained Hyper-Connections (mHC) impose approximate double stochasticity through Sinkhorn normalization, whereas mHC-lite achieves exact constraints using convex combinations of permutation matrices at factorial costs. KromHC lowers expenses with Kronecker-product parameterizations but limits mixing matrices to a structured submanifold of the Birkhoff polytope. TBP and RTBP generate precisely doubly stochastic mixing matrices with (n-1)^2 degrees of freedom, eliminating the need for iterative normalization and combinatorial explosions while maintaining the Birkhoff polytope's full expressivity. Empirical evidence from language tasks supports their effectiveness.

Key facts

  • TBP and RTBP parameterizations construct exactly doubly stochastic mixing matrices.
  • They achieve (n-1)^2 degrees of freedom.
  • The approach avoids iterative normalization and combinatorial explosion.
  • It preserves full expressivity of the Birkhoff polytope.
  • Empirical results on language tasks are reported.
  • Hyper-Connections improve residual networks via learnable mixing across multiple residual streams.
  • Unconstrained mixing leads to training instability.
  • mHC enforces approximate double stochasticity via Sinkhorn normalization.
  • mHC-lite ensures exact constraints via convex combinations of permutation matrices at factorial cost.
  • KromHC uses Kronecker-product parameterizations but restricts to a structured submanifold.

Entities

Sources