ARTFEED — Contemporary Art Intelligence

Entropy-Gradient Inversion: New Framework for Large Reasoning Models

ai-technology · 2026-05-20

A new paper on arXiv (2605.17770) introduces Entropy-Gradient Inversion, a geometric fingerprint for reasoning capability in Large Reasoning Models (LRMs). The authors identify a robust negative correlation between token entropy and logit gradients, which they formalize as Entropy-Gradient Inversion. Based on this, they propose Correlation-Regularized Group Policy Optimization (CorR-PO), embedding the inversion signature into reinforcement learning reward regularization. The work aims to address the gap between token-level behavioral analysis and internal reasoning mechanisms, as well as the instability of RL for reasoning optimization. Experiments on various reasoning benchmarks demonstrate the effectiveness of the approach.

Key facts

  • Paper arXiv:2605.17770 introduces Entropy-Gradient Inversion
  • Entropy-Gradient Inversion is a negative correlation between token entropy and logit gradients
  • It acts as a geometric fingerprint for LRM reasoning capability
  • CorR-PO embeds the inversion signature into RL reward regularization
  • The work addresses the gap between token-level analysis and internal reasoning
  • It also addresses instability of RL for reasoning optimization
  • Experiments were conducted on various reasoning benchmarks
  • The paper is categorized as a new announcement on arXiv

Entities

Institutions

  • arXiv

Sources