ARTFEED — Contemporary Art Intelligence

DERL: Differentiable Evolutionary Reinforcement Learning for Reward Optimization

ai-technology · 2026-05-14

A new framework called Differentiable Evolutionary Reinforcement Learning (DERL) addresses the challenge of reward signal design in reinforcement learning. DERL uses a bi-level structure with a Meta-Optimizer that evolves reward functions from atomic primitives, introducing differentiability via policy gradients from inner-loop validation performance. This contrasts with prior black-box methods that treat reward functions as non-differentiable. The approach aims to exploit causal dynamics between reward modifications and policy outcomes for complex reasoning tasks.

Key facts

  • DERL stands for Differentiable Evolutionary Reinforcement Learning
  • It is a bi-level framework for autonomous discovery of optimal reward structures
  • The Meta-Optimizer evolves reward functions through composition of atomic primitives
  • Differentiability is introduced by updating the Meta-Optimizer using policy gradients
  • Gradients are derived from inner-loop validation performance
  • Prior methods treat reward functions as black boxes using derivative-free search
  • The framework targets complex reasoning tasks in reinforcement learning
  • The paper is available on arXiv with ID 2512.13399

Entities

Institutions

  • arXiv

Sources