ARTFEED — Contemporary Art Intelligence

EdgeRazor: Lightweight LLMs via Mixed-Precision Quantization-Aware Distillation

ai-technology · 2026-05-07

EdgeRazor is a lightweight framework for deploying large language models (LLMs) on resource-constrained devices. It addresses limitations of existing quantization methods—Post-Training Quantization (PTQ), Quantization-Aware Training (QAT), and Quantization-Aware Distillation—by integrating mixed-precision quantization with knowledge distillation from a full-precision teacher model. Unlike prior approaches that manually select features and rely on teacher-specific data, EdgeRazor automates the process, reducing computational demands while maintaining model accuracy. The framework is detailed in a paper on arXiv (2605.04062).

Key facts

  • EdgeRazor is proposed for LLM deployment on resource-constrained devices.
  • It uses mixed-precision quantization-aware distillation.
  • Existing methods include PTQ, QAT, and Quantization-Aware Distillation.
  • PTQ suffers degradation below 4-bit precision.
  • QAT requires substantial computational resources.
  • Quantization-Aware Distillation manually selects features and depends on teacher-specific data.
  • EdgeRazor automates feature selection and reduces computational demands.
  • The paper is available on arXiv with ID 2605.04062.

Entities

Institutions

  • arXiv

Sources