HiF8 W8A8 QAT Failure Modes in OpenPangu-Embedded-1B
A study on quantization-aware training (QAT) with HiF8 W8A8 for OpenPangu-Embedded-1B reveals two orthogonal failure modes: amax saturation from delayed scale estimates causing forward-pass clipping, and catastrophic forgetting from aggressive learning rates. Neither is detectable via training loss. The authors propose a conservative max-algorithm DTS over a 64-step window for amax saturation and a 500-step BF16 warmup with lr=10^{-5} for forgetting. Both fixes are necessary and sufficient.
Key facts
- arXiv:2605.26189v1
- HiF8 W8A8 QAT for OpenPangu-Embedded-1B
- Delayed Tensor Scaling (DTS) used
- Two failure modes: amax saturation and catastrophic forgetting
- amax saturation caused by delayed scale estimates
- Catastrophic forgetting from aggressive learning rate
- Conservative max-algorithm DTS over 64-step window proposed
- 500-step BF16 warmup with lr=10^{-5} proposed
Entities
Institutions
- arXiv