BiSpikCLM: First Fully Binary Spiking Language Model

ai-technology · 2026-05-16

A team of researchers has introduced BiSpikCLM, the inaugural fully binary spiking causal language model that operates without matrix multiplication, targeting energy efficiency in extensive language models. This model features Softmax-Free Spiking Attention (SFSA), which removes the need for softmax and floating-point calculations, along with Spike-Aware Alignment Distillation (SpAD) to facilitate effective training by aligning an artificial neural network (ANN) teacher with a spiking neural network (SNN) student at various levels. The goal of this model is to match the performance of its ANN equivalents while significantly lowering power usage.

Key facts

BiSpikCLM is the first fully binary spiking MatMul-free causal language model.
It uses Softmax-Free Spiking Attention (SFSA) to eliminate softmax and floating-point operations.
Spike-Aware Alignment Distillation (SpAD) aligns ANN teacher and SNN student across embeddings, attention maps, intermediate features, and output logits.
The model targets energy efficiency for large language models.
Spiking Neural Networks (SNNs) are event-driven and ultra-low power.
Existing spiking LLMs still require intensive floating-point matrix multiplication and nonlinearities.
The approach aims to reach comparable performance to ANN counterparts.
The paper is available on arXiv under ID 2605.13859.

BiSpikCLM: First Fully Binary Spiking Language Model

Key facts

Entities

Institutions

Sources