TimeRewarder: Learning Dense Reward from Passive Videos via Temporal Distance

ai-technology · 2026-05-22

TimeRewarder introduces an innovative approach to reward learning by extracting progress estimation signals from passive video sources, such as human videos and robot demonstrations, through the analysis of temporal distances between pairs of frames. This method provides incremental proxy rewards to facilitate reinforcement learning. In tests involving ten Meta-World tasks, TimeRewarder significantly enhances performance in sparse-reward scenarios, attaining nearly flawless success in 9 out of 10 tasks with merely 200,000 interactions per task, surpassing earlier techniques.

Key facts

TimeRewarder learns dense rewards from passive videos
Uses temporal distances between frame pairs
Tested on ten Meta-World tasks
Achieved nearly perfect success in 9/10 tasks
Used only 200,000 environment interactions per task
Outperformed previous methods
Addresses sparse-reward problems in RL
Published on arXiv with ID 2509.26627

TimeRewarder: Learning Dense Reward from Passive Videos via Temporal Distance

Key facts

Entities

Institutions

Sources