Kalman Filter Offers Principled Alternative to Reward Normalization in RL

other · 2026-04-29

A recent study introduces K-Score, an innovative technique that substitutes conventional reward normalization in policy gradient reinforcement learning with a 1D Kalman filter for real-time reward assessment. This method continuously estimates the underlying reward average, effectively smoothing out high-variance returns and adjusting to changing environments without altering current policy frameworks. Tests conducted on LunarLander and CartPole demonstrate faster convergence and lower training variance relative to traditional methods. The source code can be accessed publicly.

Key facts

Method integrates a 1D Kalman filter for online reward estimation.
Recursively estimates latent reward mean, smoothing high-variance returns.
Adapts to non-stationary environments.
Requires no modification to existing policy architectures.
Experiments on LunarLander and CartPole show accelerated convergence.
Reduces training variance compared to standard normalization.
Code is available at provided URL.
Paper is titled 'K-Score: Kalman Filter as a Principled Alternative to Reward Normalization in Reinforcement Learning'.

Kalman Filter Offers Principled Alternative to Reward Normalization in RL

Key facts

Entities

Institutions

Sources