Physics-Informed Reward Shaping Improves Building Energy Management
A new method called PIRS (Physics-Informed Reward Shaping) has been developed to replace random comfort proxies in deep reinforcement learning for managing energy in buildings. Specifically designed for Soft Actor-Critic (SAC) agents, PIRS uses the ISO 7730 Predicted Mean Vote (PMV) model to base comfort signals on thermal comfort principles. This innovation clarifies rewards and provides a comfort proxy that meets standards, while keeping other parts of the learning process intact. It was evaluated in CityLearn v2.1.2 during the 2022 challenge phase 1, where a central SAC agent was trained for 50,000 steps, successfully balancing occupant comfort, energy efficiency, and grid awareness.
Key facts
- PIRS stands for Physics-Informed Reward Shaping
- It replaces ad-hoc comfort proxies with ISO 7730 PMV formulation
- Used in Soft Actor-Critic (SAC) for building energy management
- Evaluated in CityLearn v2.1.2 challenge 2022 phase 1
- Central SAC agent trained for 50k steps
- Improves reward interpretability
- Does not change other learning pipeline components
- Addresses occupant comfort and grid-aware energy efficiency
Entities
—