ARTFEED — Contemporary Art Intelligence

BitCal-TTS: Bit-Calibrated Test-Time Scaling for Quantized Reasoning Models

other · 2026-05-09

A new method, BitCal-TTS, addresses the problem of miscalibrated confidence in quantized large reasoning models during test-time compute allocation. Post-training quantization reduces memory and latency but distorts confidence signals, causing early halting where models stop reasoning prematurely. BitCal-TTS combines online uncertainty proxies, bit-conditioned confidence rescaling, and a confirmation horizon for structured outputs like GSM8K. It requires no fine-tuning and integrates with standard inference.

Key facts

  • BitCal-TTS is a lightweight runtime controller for quantized reasoning models.
  • Post-training quantization can distort confidence signals in adaptive test-time compute allocation.
  • Miscalibrated confidence leads to harmful early halting in greedy 4-bit inference.
  • BitCal-TTS uses online proxies for token-level uncertainty and reasoning-trace stability.
  • It applies bit-conditioned confidence rescaling that is conservative at low precision.
  • Includes a bit-aware post-marker confirmation horizon designed for GSM8K-style outputs.
  • No fine-tuning of the base model is required.
  • The method integrates with standard inference pipelines.

Entities

Sources