ARTFEED — Contemporary Art Intelligence

Physics-Informed Reward Shaping Improves Building Energy Management

other · 2026-05-28

A new method called PIRS (Physics-Informed Reward Shaping) has been developed to replace random comfort proxies in deep reinforcement learning for managing energy in buildings. Specifically designed for Soft Actor-Critic (SAC) agents, PIRS uses the ISO 7730 Predicted Mean Vote (PMV) model to base comfort signals on thermal comfort principles. This innovation clarifies rewards and provides a comfort proxy that meets standards, while keeping other parts of the learning process intact. It was evaluated in CityLearn v2.1.2 during the 2022 challenge phase 1, where a central SAC agent was trained for 50,000 steps, successfully balancing occupant comfort, energy efficiency, and grid awareness.

Key facts

  • PIRS stands for Physics-Informed Reward Shaping
  • It replaces ad-hoc comfort proxies with ISO 7730 PMV formulation
  • Used in Soft Actor-Critic (SAC) for building energy management
  • Evaluated in CityLearn v2.1.2 challenge 2022 phase 1
  • Central SAC agent trained for 50k steps
  • Improves reward interpretability
  • Does not change other learning pipeline components
  • Addresses occupant comfort and grid-aware energy efficiency

Entities

Sources