ARTFEED — Contemporary Art Intelligence

BWLA: 1-bit Weights and Low-bit Activations for LLMs

ai-technology · 2026-05-04

A new framework called BWLA (Binarized Weights and Low-bit Activations) has been introduced by researchers for post-training quantization in large language models (LLMs). This method achieves 1-bit weight quantization and utilizes low-bit activations, such as 6 bits, while maintaining high accuracy. Current binarization techniques struggle with activation heavy tails, which necessitate high-precision activations and hinder end-to-end acceleration. BWLA employs the Orthogonal-Kronecker Transformation (OKT) to create an orthogonal mapping through EM minimization, transforming unimodal weights into symmetric bimodal forms and reducing activation tails and incoherence. Additionally, the Proximal SVD Projection (PSP) facilitates lightweight low-rank refinement. More information can be found in arXiv:2605.00422v1.

Key facts

  • BWLA stands for Binarized Weights and Low-bit Activations
  • It is a post-training quantization framework for LLMs
  • Achieves 1-bit weight quantization with low-bit activations (e.g., 6 bits)
  • Uses Orthogonal-Kronecker Transformation (OKT) for orthogonal mapping via EM minimization
  • OKT converts unimodal weights to symmetric bimodal forms
  • OKT suppresses activation tails and incoherence
  • Uses Proximal SVD Projection (PSP) for lightweight low-rank refinement
  • Paper published on arXiv with ID 2605.00422v1

Entities

Institutions

  • arXiv

Sources