Physics-Informed Reward Shaping Improves Building Energy Management

other · 2026-05-28

A new method called PIRS (Physics-Informed Reward Shaping) has been developed to replace random comfort proxies in deep reinforcement learning for managing energy in buildings. Specifically designed for Soft Actor-Critic (SAC) agents, PIRS uses the ISO 7730 Predicted Mean Vote (PMV) model to base comfort signals on thermal comfort principles. This innovation clarifies rewards and provides a comfort proxy that meets standards, while keeping other parts of the learning process intact. It was evaluated in CityLearn v2.1.2 during the 2022 challenge phase 1, where a central SAC agent was trained for 50,000 steps, successfully balancing occupant comfort, energy efficiency, and grid awareness.

Key facts

PIRS stands for Physics-Informed Reward Shaping
It replaces ad-hoc comfort proxies with ISO 7730 PMV formulation
Used in Soft Actor-Critic (SAC) for building energy management
Evaluated in CityLearn v2.1.2 challenge 2022 phase 1
Central SAC agent trained for 50k steps
Improves reward interpretability
Does not change other learning pipeline components
Addresses occupant comfort and grid-aware energy efficiency

Entities

—

Sources

arXiv cs.AI — 2026-05-28