ActQuant: Sub-4-bit Quantization for Vision-Language-Action Models
ActQuant is a post-training quantization framework for Vision-Language-Action (VLA) models, enabling sub-4-bit weight quantization to reduce computational demands for edge deployment. It uses a two-stage approach: an inter-tensor bit allocator assigns bit-widths per weight matrix based on action prediction contribution, and an intra-tensor scale optimizer tunes per-block scales using action-aware curvature. The framework also includes OmniModel.cpp, a conversion pipeline for on-device deployment.
Key facts
- ActQuant targets sub-4-bit weight quantization for VLA models.
- It uses a two-stage PTQ framework: inter-tensor bit allocator and intra-tensor scale optimizer.
- The inter-tensor allocator assigns bit-widths based on contribution to action prediction.
- The intra-tensor optimizer uses action-aware curvature to concentrate dynamic range on influential weights.
- OmniModel.cpp is an agentic conversion pipeline for on-device deployment.
- Existing PTQ methods suffer severe performance degradation in sub-4-bit regime.
- VLA models exhibit remarkable action generation for embodied intelligence.
- Heavy compute makes VLA deployment on edge platforms impractical.
Entities
—