EdgeRazor: Lightweight LLMs via Mixed-Precision Quantization-Aware Distillation
EdgeRazor is a lightweight framework for deploying large language models (LLMs) on resource-constrained devices. It addresses limitations of existing quantization methods—Post-Training Quantization (PTQ), Quantization-Aware Training (QAT), and Quantization-Aware Distillation—by integrating mixed-precision quantization with knowledge distillation from a full-precision teacher model. Unlike prior approaches that manually select features and rely on teacher-specific data, EdgeRazor automates the process, reducing computational demands while maintaining model accuracy. The framework is detailed in a paper on arXiv (2605.04062).
Key facts
- EdgeRazor is proposed for LLM deployment on resource-constrained devices.
- It uses mixed-precision quantization-aware distillation.
- Existing methods include PTQ, QAT, and Quantization-Aware Distillation.
- PTQ suffers degradation below 4-bit precision.
- QAT requires substantial computational resources.
- Quantization-Aware Distillation manually selects features and depends on teacher-specific data.
- EdgeRazor automates feature selection and reduces computational demands.
- The paper is available on arXiv with ID 2605.04062.
Entities
Institutions
- arXiv