ARTFEED — Contemporary Art Intelligence

Adaptive Batch Scaling Improves Reinforcement Learning Efficiency

other · 2026-05-23

A new paper on arXiv challenges the conventional wisdom that large-batch training is incompatible with Reinforcement Learning (RL). The authors observe that non-stationarity in RL evolves throughout training: early stages require small batches for plasticity, while late stages benefit from large batches for convergence. They propose Adaptive Batch Scaling (ABS), which dynamically adjusts batch size based on policy stability using a novel metric called Behavioral Divergence. This metric quantifies action-level shifts between consecutive updates. The approach aims to improve scalability and performance in on-policy RL.

Key facts

  • Paper challenges large-batch training incompatibility with RL
  • Non-stationarity evolves during training
  • Early stages need small batches
  • Late stages benefit from large batches
  • Proposes Adaptive Batch Scaling (ABS)
  • ABS uses Behavioral Divergence metric
  • Behavioral Divergence measures action-level shifts
  • Aims to improve RL scalability and performance

Entities

Institutions

  • arXiv

Sources