ARTFEED — Contemporary Art Intelligence

BiSpikCLM: First Fully Binary Spiking Language Model

ai-technology · 2026-05-16

A team of researchers has introduced BiSpikCLM, the inaugural fully binary spiking causal language model that operates without matrix multiplication, targeting energy efficiency in extensive language models. This model features Softmax-Free Spiking Attention (SFSA), which removes the need for softmax and floating-point calculations, along with Spike-Aware Alignment Distillation (SpAD) to facilitate effective training by aligning an artificial neural network (ANN) teacher with a spiking neural network (SNN) student at various levels. The goal of this model is to match the performance of its ANN equivalents while significantly lowering power usage.

Key facts

  • BiSpikCLM is the first fully binary spiking MatMul-free causal language model.
  • It uses Softmax-Free Spiking Attention (SFSA) to eliminate softmax and floating-point operations.
  • Spike-Aware Alignment Distillation (SpAD) aligns ANN teacher and SNN student across embeddings, attention maps, intermediate features, and output logits.
  • The model targets energy efficiency for large language models.
  • Spiking Neural Networks (SNNs) are event-driven and ultra-low power.
  • Existing spiking LLMs still require intensive floating-point matrix multiplication and nonlinearities.
  • The approach aims to reach comparable performance to ANN counterparts.
  • The paper is available on arXiv under ID 2605.13859.

Entities

Institutions

  • arXiv

Sources