BitCal-TTS: Bit-Calibrated Test-Time Scaling for Quantized Reasoning Models

other · 2026-05-09

A new method, BitCal-TTS, addresses the problem of miscalibrated confidence in quantized large reasoning models during test-time compute allocation. Post-training quantization reduces memory and latency but distorts confidence signals, causing early halting where models stop reasoning prematurely. BitCal-TTS combines online uncertainty proxies, bit-conditioned confidence rescaling, and a confirmation horizon for structured outputs like GSM8K. It requires no fine-tuning and integrates with standard inference.

Key facts

BitCal-TTS is a lightweight runtime controller for quantized reasoning models.
Post-training quantization can distort confidence signals in adaptive test-time compute allocation.
Miscalibrated confidence leads to harmful early halting in greedy 4-bit inference.
BitCal-TTS uses online proxies for token-level uncertainty and reasoning-trace stability.
It applies bit-conditioned confidence rescaling that is conservative at low precision.
Includes a bit-aware post-marker confirmation horizon designed for GSM8K-style outputs.
No fine-tuning of the base model is required.
The method integrates with standard inference pipelines.

Entities

—

Sources

arXiv cs.AI — 2026-05-09