Adaptive Batch Scaling Improves Reinforcement Learning Efficiency

other · 2026-05-23

A new paper on arXiv challenges the conventional wisdom that large-batch training is incompatible with Reinforcement Learning (RL). The authors observe that non-stationarity in RL evolves throughout training: early stages require small batches for plasticity, while late stages benefit from large batches for convergence. They propose Adaptive Batch Scaling (ABS), which dynamically adjusts batch size based on policy stability using a novel metric called Behavioral Divergence. This metric quantifies action-level shifts between consecutive updates. The approach aims to improve scalability and performance in on-policy RL.

Key facts

Paper challenges large-batch training incompatibility with RL
Non-stationarity evolves during training
Early stages need small batches
Late stages benefit from large batches
Proposes Adaptive Batch Scaling (ABS)
ABS uses Behavioral Divergence metric
Behavioral Divergence measures action-level shifts
Aims to improve RL scalability and performance

Adaptive Batch Scaling Improves Reinforcement Learning Efficiency

Key facts

Entities

Institutions

Sources