ARTFEED — Contemporary Art Intelligence

Entropy Centroids as Intrinsic Rewards for Test-Time Scaling

ai-technology · 2026-04-30

A new method uses entropy centroids as intrinsic rewards to scale test-time compute for large language models, avoiding external reward models. The approach leverages the observation that high-entropy tokens cluster into consecutive groups during inference, providing stable uncertainty signals. This temporal structure is formalized into segment-level rewards, offering an alternative to confidence-based or entropy-based methods that suffer from noise. The work is published on arXiv under ID 2604.26173.

Key facts

  • Method uses entropy centroids as intrinsic rewards
  • Avoids external reward models
  • High-entropy tokens cluster into consecutive groups
  • Provides stable model uncertainty signals
  • Formalizes segment-level rewards
  • Published on arXiv: 2604.26173
  • Related to Grok Heavy and Gemini Deep Think
  • Addresses test-time compute scaling

Entities

Institutions

  • arXiv

Sources