BiSpikCLM: First Fully Binary Spiking Language Model
A team of researchers has introduced BiSpikCLM, the inaugural fully binary spiking causal language model that operates without matrix multiplication, targeting energy efficiency in extensive language models. This model features Softmax-Free Spiking Attention (SFSA), which removes the need for softmax and floating-point calculations, along with Spike-Aware Alignment Distillation (SpAD) to facilitate effective training by aligning an artificial neural network (ANN) teacher with a spiking neural network (SNN) student at various levels. The goal of this model is to match the performance of its ANN equivalents while significantly lowering power usage.
Key facts
- BiSpikCLM is the first fully binary spiking MatMul-free causal language model.
- It uses Softmax-Free Spiking Attention (SFSA) to eliminate softmax and floating-point operations.
- Spike-Aware Alignment Distillation (SpAD) aligns ANN teacher and SNN student across embeddings, attention maps, intermediate features, and output logits.
- The model targets energy efficiency for large language models.
- Spiking Neural Networks (SNNs) are event-driven and ultra-low power.
- Existing spiking LLMs still require intensive floating-point matrix multiplication and nonlinearities.
- The approach aims to reach comparable performance to ANN counterparts.
- The paper is available on arXiv under ID 2605.13859.
Entities
Institutions
- arXiv