BitCal-TTS: Bit-Calibrated Test-Time Scaling for Quantized Reasoning Models
A new method, BitCal-TTS, addresses the problem of miscalibrated confidence in quantized large reasoning models during test-time compute allocation. Post-training quantization reduces memory and latency but distorts confidence signals, causing early halting where models stop reasoning prematurely. BitCal-TTS combines online uncertainty proxies, bit-conditioned confidence rescaling, and a confirmation horizon for structured outputs like GSM8K. It requires no fine-tuning and integrates with standard inference.
Key facts
- BitCal-TTS is a lightweight runtime controller for quantized reasoning models.
- Post-training quantization can distort confidence signals in adaptive test-time compute allocation.
- Miscalibrated confidence leads to harmful early halting in greedy 4-bit inference.
- BitCal-TTS uses online proxies for token-level uncertainty and reasoning-trace stability.
- It applies bit-conditioned confidence rescaling that is conservative at low precision.
- Includes a bit-aware post-marker confirmation horizon designed for GSM8K-style outputs.
- No fine-tuning of the base model is required.
- The method integrates with standard inference pipelines.
Entities
—