B-Spline Decoupling Improves Transformer Compression

ai-technology · 2026-05-20

A novel decoupling framework utilizing B-splines expands on current tensor-based techniques for the compression of transformer models. This decoupling method expresses multivariate functions through combinations of linear transformations and univariate nonlinear functions, connecting to neural networks featuring a single hidden layer with adaptable activations. Current methods depend on polynomial or piecewise-linear parameterizations, which face issues of numerical instability or restricted expressiveness. The introduced framework leverages the local support of B-splines and allows for flexible smoothness control to address these challenges. This research has been made available on arXiv (2605.18794).

Key facts

Decoupling is a modeling paradigm for multivariate functions.
Single-layer decoupling equals a fully connected neural network with one hidden layer.
Decoupling methods are used for neural network compression.
Existing tensor-based decoupling uses polynomial or piecewise-linear functions.
B-spline framework generalizes existing approaches.
B-splines offer local support and smoothness control.
The work appears on arXiv with ID 2605.18794.

B-Spline Decoupling Improves Transformer Compression

Key facts

Entities

Institutions

Sources