Semi-Markov RL for EV Ride-Hailing with Feasibility Guarantees

other · 2026-04-30

A novel semi-Markov reinforcement learning framework has been created for large-scale electric vehicle (EV) ride-hailing operations, ensuring that actions remain physically feasible. This issue is approached as a hex-grid semi-Markov decision process (semi-MDP), incorporating mixed actions that include discrete choices (serving, repositioning, charging) alongside continuous charging power and varying durations. To maintain feasibility throughout training and implementation, the policy is informed by high-level intentions generated by a masked, temperature-annealed actor. These intentions are applied at each decision point via a time-constrained rolling mixed-integer linear program (MILP) that enforces state-of-charge, port, and feeder limitations. To address distributional shifts, a Soft Actor-Critic (SAC) agent is optimized against a Wasserstein-1 ambiguity set with a graph-based framework. This research tackles the complexities of uncertain, spatially correlated demand and travel times while adhering to charger and feeder constraints. The study can be accessed on arXiv with ID 2604.25848.

Key facts

The study focuses on city-scale EV ride-hailing fleet control.
The problem is modeled as a hex-grid semi-MDP with mixed actions.
Actions include discrete decisions (serving, repositioning, charging) and continuous charging power.
A masked, temperature-annealed actor produces high-level intentions.
A time-limited rolling MILP projects intentions to enforce feasibility.
Constraints include state-of-charge, port, and feeder limits.
Soft Actor-Critic (SAC) is optimized against a Wasserstein-1 ambiguity set.
The paper is published on arXiv with ID 2604.25848.

Semi-Markov RL for EV Ride-Hailing with Feasibility Guarantees

Key facts

Entities

Institutions

Sources