Motion-Compensated Weight Compression for Neural Networks
A novel weight compression technique known as Motion-Compensated Weight Compression (MCWC) has been introduced on arXiv. This approach organizes permutation-symmetric blocks, such as hidden units and attention heads, to leverage redundancy across layers, viewing depth as a foreseeable sequence. It employs a simple layer-sequential predictor that utilizes periodic keyframes and encodes quantized prediction residuals with a learned entropy model. Weights are reconstructed by the decoder through entropy decoding, dequantization, predictor-guided reconstruction, and inverse alignment. This method enhances compression efficiency in Transformer language modeling and vision classification tasks.
Key facts
- MCWC stands for Motion-Compensated Weight Compression.
- It aligns permutation-symmetric blocks such as hidden units and attention heads.
- The method turns depth into a predictable sequence.
- It uses a lightweight layer-sequential predictor with periodic keyframes.
- Encodes quantized prediction residuals using a learned entropy model.
- Decoder reconstructs deployable weights for fast inference.
- Tested on Transformer language modeling and vision classification.
- Improves compression performance over existing methods.
Entities
Institutions
- arXiv