Dual-Window Smoothing for Implicit Action Chunking in Continuous Control
A novel reinforcement learning framework, named Dual-Window Smoothing (DWS), tackles the issue of high-frequency oscillatory control signals that threaten safety and stability in real-world applications. Unlike traditional action chunking methods, which predict fixed-horizon trajectories and increase policy output dimensions with horizon length—resulting in optimization challenges and step-wise interaction issues—DWS maintains temporal coherence without enlarging the action space. It features a dual-window approach: the execution window guarantees physical smoothness via deterministic modulation, while the value window synchronizes temporal-difference targets across the horizon to mitigate critic bias from open-loop execution. Additionally, DWS incorporates a lightweight temporal mechanism on the actor side. This research is documented in a paper available on arXiv under ID 2605.19592.
Key facts
- Dual-Window Smoothing (DWS) is proposed for smooth continuous control in reinforcement learning.
- Explicit action chunking predicts fixed-horizon trajectories but scales policy output dimension with horizon length.
- DWS uses an execution window for physical smoothness via deterministic modulation.
- DWS uses a value window to align temporal-difference targets over the horizon.
- DWS corrects critic bias caused by open-loop execution.
- DWS includes a lightweight actor-side temporal mechanism.
- The paper is available on arXiv with ID 2605.19592.
Entities
Institutions
- arXiv