BWLA: 1-bit Weights and Low-bit Activations for LLMs

ai-technology · 2026-05-04

A new framework called BWLA (Binarized Weights and Low-bit Activations) has been introduced by researchers for post-training quantization in large language models (LLMs). This method achieves 1-bit weight quantization and utilizes low-bit activations, such as 6 bits, while maintaining high accuracy. Current binarization techniques struggle with activation heavy tails, which necessitate high-precision activations and hinder end-to-end acceleration. BWLA employs the Orthogonal-Kronecker Transformation (OKT) to create an orthogonal mapping through EM minimization, transforming unimodal weights into symmetric bimodal forms and reducing activation tails and incoherence. Additionally, the Proximal SVD Projection (PSP) facilitates lightweight low-rank refinement. More information can be found in arXiv:2605.00422v1.

Key facts

BWLA stands for Binarized Weights and Low-bit Activations
It is a post-training quantization framework for LLMs
Achieves 1-bit weight quantization with low-bit activations (e.g., 6 bits)
Uses Orthogonal-Kronecker Transformation (OKT) for orthogonal mapping via EM minimization
OKT converts unimodal weights to symmetric bimodal forms
OKT suppresses activation tails and incoherence
Uses Proximal SVD Projection (PSP) for lightweight low-rank refinement
Paper published on arXiv with ID 2605.00422v1

BWLA: 1-bit Weights and Low-bit Activations for LLMs

Key facts

Entities

Institutions

Sources