Deep RL Fails to Beat Calibrated Baselines in Adaptive Resource Control

other · 2026-05-27

A new benchmark study challenges the effectiveness of deep reinforcement learning (DRL) for adaptive resource control. The paper introduces RLScale-Bench, a reproducible benchmark evaluating six mainstream DRL algorithms—PPO, DQN, A2C, SAC, TD3, and DDPG—against a calibrated rule-based autoscaler. Across six workload patterns and five seeds (240 runs), the rule-based controller achieved the lowest cost on all workloads, though it trailed the best RL agents on bursty and flash traffic. The benchmark is instantiated on Kubernetes Horizontal Pod Autoscaling and probes distribution-shift generalization. The study finds that discrete-action DRL methods perform poorly, and that the calibrated baseline is often overlooked in prior work. The paper is available on arXiv under ID 2605.26418.

Key facts

Calibrated rule-based autoscaler beats six DRL algorithms on cost across all workloads tested.
RLScale-Bench is a reproducible benchmark for DRL on adaptive resource control.
Evaluated algorithms: PPO, DQN, A2C, SAC, TD3, DDPG.
Six workload patterns and five seeds used, totaling 240 runs.
Benchmark instantiated on Kubernetes Horizontal Pod Autoscaling.
Rule-based controller achieves lowest cost on all workloads.
RL agents perform better on bursty and flash traffic.
Paper available on arXiv: 2605.26418.

Deep RL Fails to Beat Calibrated Baselines in Adaptive Resource Control

Key facts

Entities

Institutions

Sources