RQ-MoE: Residual Quantization via Mixture of Experts for Efficient Input-Dependent Vector Compression

other · 2026-05-16

A new framework for vector quantization called RQ-MoE (Residual Quantization via Mixture of Experts) has been developed by researchers. This innovative method integrates a two-level mixture of experts with dual-stream quantization, enabling the adaptation of the codebook based on input. This allows for the dynamic creation of codebooks and separates instructions from quantization, which aids in parallel decoding. Notably, RQ-MoE can recover standard Residual Quantization and QINCo as constrained special cases. Additionally, guidelines for determining expert dimensionality have been established. Comprehensive experiments validate the framework's ability to compress high-dimensional embeddings, overcoming the challenges posed by static codebooks and sequential dependencies found in current techniques.

Key facts

RQ-MoE combines a two-level MoE with dual-stream quantization.
It enables input-dependent codebook adaptation for vector quantization.
RQ-MoE facilitates parallel decoding by decoupling instruction from quantization.
Standard Residual Quantization and QINCo are special cases of RQ-MoE.
A guideline for setting expert dimensionality is derived.
Extensive experiments show the framework's effectiveness.
The work is published on arXiv with ID 2605.14359.
The method addresses limitations of static codebooks and sequential dependencies.

RQ-MoE: Residual Quantization via Mixture of Experts for Efficient Input-Dependent Vector Compression

Key facts

Entities

Institutions

Sources