Entropy-Gradient Inversion: New Framework for Large Reasoning Models
A new paper on arXiv (2605.17770) introduces Entropy-Gradient Inversion, a geometric fingerprint for reasoning capability in Large Reasoning Models (LRMs). The authors identify a robust negative correlation between token entropy and logit gradients, which they formalize as Entropy-Gradient Inversion. Based on this, they propose Correlation-Regularized Group Policy Optimization (CorR-PO), embedding the inversion signature into reinforcement learning reward regularization. The work aims to address the gap between token-level behavioral analysis and internal reasoning mechanisms, as well as the instability of RL for reasoning optimization. Experiments on various reasoning benchmarks demonstrate the effectiveness of the approach.
Key facts
- Paper arXiv:2605.17770 introduces Entropy-Gradient Inversion
- Entropy-Gradient Inversion is a negative correlation between token entropy and logit gradients
- It acts as a geometric fingerprint for LRM reasoning capability
- CorR-PO embeds the inversion signature into RL reward regularization
- The work addresses the gap between token-level analysis and internal reasoning
- It also addresses instability of RL for reasoning optimization
- Experiments were conducted on various reasoning benchmarks
- The paper is categorized as a new announcement on arXiv
Entities
Institutions
- arXiv