ARTFEED — Contemporary Art Intelligence

Closed-Form Policy Steering for Frozen Offline RL Actors

other · 2026-04-29

A recent publication on arXiv (2604.22873) presents a novel closed-form strategy for modifying frozen offline reinforcement learning (RL) policies during deployment without the need for retraining. This technique employs Product-of-Experts (PoE) composition alongside a goal-conditioned prior. A significant insight reveals that precision-weighted composition maintains stability even with degraded or random priors, staying connected to the frozen actor, whereas additive and prior-only adaptations fail. The KL-budget selector frequently achieves performance close to an oracle. For diagonal-Gaussian actors and priors, PoE with an alpha coefficient produces the same deterministic policy as KL-regularized adaptation with beta set to alpha / (1 - alpha). This research addresses situations where retraining is unfeasible due to limitations in data, costs, or governance.

Key facts

  • arXiv paper 2604.22873
  • Offline RL policy adaptation without retraining
  • Product-of-Experts composition with goal-conditioned prior
  • Precision-weighted composition shows graceful degradation
  • Additive and prior-only adaptation collapse under degraded priors
  • KL-budget selector recovers near-oracle operating point
  • Closed-form identity: PoE with alpha equals KL-regularized with beta = alpha/(1-alpha)
  • Frozen actor setting for diagonal-Gaussian distributions

Entities

Institutions

  • arXiv

Sources