ActQuant: Sub-4-bit Quantization for Vision-Language-Action Models

ai-technology · 2026-05-26

ActQuant is a post-training quantization framework for Vision-Language-Action (VLA) models, enabling sub-4-bit weight quantization to reduce computational demands for edge deployment. It uses a two-stage approach: an inter-tensor bit allocator assigns bit-widths per weight matrix based on action prediction contribution, and an intra-tensor scale optimizer tunes per-block scales using action-aware curvature. The framework also includes OmniModel.cpp, a conversion pipeline for on-device deployment.

Key facts

ActQuant targets sub-4-bit weight quantization for VLA models.
It uses a two-stage PTQ framework: inter-tensor bit allocator and intra-tensor scale optimizer.
The inter-tensor allocator assigns bit-widths based on contribution to action prediction.
The intra-tensor optimizer uses action-aware curvature to concentrate dynamic range on influential weights.
OmniModel.cpp is an agentic conversion pipeline for on-device deployment.
Existing PTQ methods suffer severe performance degradation in sub-4-bit regime.
VLA models exhibit remarkable action generation for embodied intelligence.
Heavy compute makes VLA deployment on edge platforms impractical.

Entities

—

Sources

arXiv cs.AI — 2026-05-26