MOMO: A multimodal framework for robot skill learning and adaptation

ai-technology · 2026-04-24

MOMO is an innovative interactive framework created by researchers to facilitate the adaptation of robot skills through three key modalities: kinesthetic touch for spatial adjustments, natural language for semantic changes, and a graphical web interface that allows users to visualize trajectories and modify via-points with drag-and-drop functionality. This system comprises five essential components: energy-based detection of human intentions, a tool-based LLM architecture that chooses and parameterizes safe language adaptation functions, Kernelized Movement Primitives (KMPs) for encoding motion, probabilistic Virtual Fixtures for guided demonstrations, and a web interface. Aimed at non-expert users, this framework enhances the adaptability of industrial robots for diverse tasks and environments, promoting flexible human-robot interaction. The paper can be accessed on arXiv with ID 2604.20468.

Key facts

MOMO enables robot skill adaptation via kinesthetic touch, natural language, and a graphical web interface.
The framework uses a tool-based LLM architecture that selects and parameterizes predefined functions.
It integrates energy-based human-intention detection, KMPs, and probabilistic Virtual Fixtures.
The system targets non-expert users for flexible industrial robot applications.
The paper was published on arXiv with ID 2604.20468.
The approach aims to allow easy adaptation for varying tasks and environments.
The web interface supports visualizing geometric relations and trajectories.
Natural language is used for high-level semantic modifications.

MOMO: A multimodal framework for robot skill learning and adaptation

Key facts

Entities

Institutions

Sources