ARTFEED — Contemporary Art Intelligence

ActQuant: Sub-4-bit Quantization for Vision-Language-Action Models

ai-technology · 2026-05-26

ActQuant is a post-training quantization framework for Vision-Language-Action (VLA) models, enabling sub-4-bit weight quantization to reduce computational demands for edge deployment. It uses a two-stage approach: an inter-tensor bit allocator assigns bit-widths per weight matrix based on action prediction contribution, and an intra-tensor scale optimizer tunes per-block scales using action-aware curvature. The framework also includes OmniModel.cpp, a conversion pipeline for on-device deployment.

Key facts

  • ActQuant targets sub-4-bit weight quantization for VLA models.
  • It uses a two-stage PTQ framework: inter-tensor bit allocator and intra-tensor scale optimizer.
  • The inter-tensor allocator assigns bit-widths based on contribution to action prediction.
  • The intra-tensor optimizer uses action-aware curvature to concentrate dynamic range on influential weights.
  • OmniModel.cpp is an agentic conversion pipeline for on-device deployment.
  • Existing PTQ methods suffer severe performance degradation in sub-4-bit regime.
  • VLA models exhibit remarkable action generation for embodied intelligence.
  • Heavy compute makes VLA deployment on edge platforms impractical.

Entities

Sources