Entropy-Gradient Inversion: New Framework for Large Reasoning Models

ai-technology · 2026-05-20

A new paper on arXiv (2605.17770) introduces Entropy-Gradient Inversion, a geometric fingerprint for reasoning capability in Large Reasoning Models (LRMs). The authors identify a robust negative correlation between token entropy and logit gradients, which they formalize as Entropy-Gradient Inversion. Based on this, they propose Correlation-Regularized Group Policy Optimization (CorR-PO), embedding the inversion signature into reinforcement learning reward regularization. The work aims to address the gap between token-level behavioral analysis and internal reasoning mechanisms, as well as the instability of RL for reasoning optimization. Experiments on various reasoning benchmarks demonstrate the effectiveness of the approach.

Key facts

Paper arXiv:2605.17770 introduces Entropy-Gradient Inversion
Entropy-Gradient Inversion is a negative correlation between token entropy and logit gradients
It acts as a geometric fingerprint for LRM reasoning capability
CorR-PO embeds the inversion signature into RL reward regularization
The work addresses the gap between token-level analysis and internal reasoning
It also addresses instability of RL for reasoning optimization
Experiments were conducted on various reasoning benchmarks
The paper is categorized as a new announcement on arXiv

Entropy-Gradient Inversion: New Framework for Large Reasoning Models

Key facts

Entities

Institutions

Sources