ELAS: Efficient Pre-Training of Low-Rank LLMs via 2:4 Activation Sparsity
A new framework called ELAS (Efficient Pre-training of Low-rank LLMs via 2:4 Activation Sparsity) is proposed to address computational bottlenecks in training large language models. The method combines low-rank training, which reduces memory usage, with 2:4 structured sparsity applied to activations (not weights) to leverage NVIDIA GPU support for sparse formats. Existing low-rank approaches leave activation matrices in full-rank, dominating memory consumption and limiting throughput during large-batch training. Directly applying sparsity to weights often causes performance degradation. ELAS applies 2:4 sparsity specifically to activations, aiming to reduce memory and improve throughput without significant accuracy loss. The paper is published on arXiv under ID 2605.03667.
Key facts
- ELAS stands for Efficient Pre-training of Low-rank LLMs via 2:4 Activation Sparsity.
- The framework targets efficient pre-training of large language models.
- It combines low-rank training with 2:4 structured sparsity on activations.
- 2:4 structured sparsity is supported by NVIDIA GPUs.
- Existing low-rank methods leave activation matrices in full-rank, causing high memory consumption.
- Direct weight sparsity leads to non-negligible performance degradation.
- ELAS aims to reduce memory and improve throughput during large-batch training.
- The paper is available on arXiv with ID 2605.03667.
Entities
Institutions
- arXiv
- NVIDIA