B-Spline Decoupling Improves Transformer Compression
A novel decoupling framework utilizing B-splines expands on current tensor-based techniques for the compression of transformer models. This decoupling method expresses multivariate functions through combinations of linear transformations and univariate nonlinear functions, connecting to neural networks featuring a single hidden layer with adaptable activations. Current methods depend on polynomial or piecewise-linear parameterizations, which face issues of numerical instability or restricted expressiveness. The introduced framework leverages the local support of B-splines and allows for flexible smoothness control to address these challenges. This research has been made available on arXiv (2605.18794).
Key facts
- Decoupling is a modeling paradigm for multivariate functions.
- Single-layer decoupling equals a fully connected neural network with one hidden layer.
- Decoupling methods are used for neural network compression.
- Existing tensor-based decoupling uses polynomial or piecewise-linear functions.
- B-spline framework generalizes existing approaches.
- B-splines offer local support and smoothness control.
- The work appears on arXiv with ID 2605.18794.
Entities
Institutions
- arXiv