ARTFEED — Contemporary Art Intelligence

ICRL: Joint Training of Solver and Critic via Reinforcement Learning

ai-technology · 2026-05-18

The ICRL framework (Internalizing Self-Critique with Reinforcement Learning) simultaneously trains a solver and a critic using a common backbone, transforming success driven by critique into independent solver capabilities. The critic receives rewards tied to the solver's improvement in performance, promoting constructive feedback. To tackle the distribution shift between behavior influenced by critique and that which is not, ICRL employs a distribution-calibration re-weighting ratio. This method seeks to allow agents based on large language models to assimilate critique guidance without needing external feedback during testing.

Key facts

  • ICRL stands for Learning to Internalize Self-Critique with Reinforcement Learning
  • The framework jointly trains a solver and a critic from a shared backbone
  • The critic is rewarded based on the solver's subsequent performance gain
  • ICRL introduces a distribution-calibration re-weighting ratio
  • The approach addresses distribution shift between critique-conditioned and critique-free behavior
  • The goal is to convert critique-induced success into unassisted solver ability
  • The paper is available on arXiv with ID 2605.15224
  • The publication date is not specified in the abstract

Entities

Institutions

  • arXiv

Sources