ASH: AI Agent Learns Embodied Tasks from Internet Video

ai-technology · 2026-05-16

Researchers have introduced an innovative AI system called ASH (Agents that Self-Hone), which can learn complex, long-duration tasks from unlabeled online videos without needing expert input or reward shaping. This system has a self-improvement feature that allows it to create an Inverse Dynamics Model (IDM) from its experiences, which it uses to gain insights from relevant internet clips. It also employs unsupervised learning to identify key events in vast video libraries, storing these as long-term memories for intricate planning. ASH was evaluated in challenging gaming scenarios, specifically Pokémon Emerald and The Legend of Zelda: The Minish Cap, outperforming traditional methods like behavioral cloning and zero-shot approaches. The research is available on arXiv under the identifier 2605.14211.

Key facts

ASH stands for Agents that Self-Hone via Embodied Learning.
ASH learns from unlabeled, noisy internet video without reward shaping or expert annotation.
ASH uses a self-improvement loop that learns an Inverse Dynamics Model (IDM) from its own trajectories.
ASH employs unsupervised learning to identify key moments from internet video and stores them as long-term memory.
ASH was evaluated on Pokémon Emerald and The Legend of Zelda: The Minish Cap.
Pokémon Emerald is a turn-based RPG; The Legend of Zelda: The Minish Cap is a real-time action-adventure game.
ASH outperformed behavioral cloning, retrieval-augmented methods, and zero-shot approaches.
The research is published on arXiv with ID 2605.14211.

ASH: AI Agent Learns Embodied Tasks from Internet Video

Key facts

Entities

Institutions

Sources