EdgeRazor: Lightweight LLMs via Mixed-Precision Quantization-Aware Distillation

ai-technology · 2026-05-07

EdgeRazor is a lightweight framework for deploying large language models (LLMs) on resource-constrained devices. It addresses limitations of existing quantization methods—Post-Training Quantization (PTQ), Quantization-Aware Training (QAT), and Quantization-Aware Distillation—by integrating mixed-precision quantization with knowledge distillation from a full-precision teacher model. Unlike prior approaches that manually select features and rely on teacher-specific data, EdgeRazor automates the process, reducing computational demands while maintaining model accuracy. The framework is detailed in a paper on arXiv (2605.04062).

Key facts

EdgeRazor is proposed for LLM deployment on resource-constrained devices.
It uses mixed-precision quantization-aware distillation.
Existing methods include PTQ, QAT, and Quantization-Aware Distillation.
PTQ suffers degradation below 4-bit precision.
QAT requires substantial computational resources.
Quantization-Aware Distillation manually selects features and depends on teacher-specific data.
EdgeRazor automates feature selection and reduces computational demands.
The paper is available on arXiv with ID 2605.04062.

EdgeRazor: Lightweight LLMs via Mixed-Precision Quantization-Aware Distillation

Key facts

Entities

Institutions

Sources