ARTFEED — Contemporary Art Intelligence

MOMO: A multimodal framework for robot skill learning and adaptation

ai-technology · 2026-04-24

MOMO is an innovative interactive framework created by researchers to facilitate the adaptation of robot skills through three key modalities: kinesthetic touch for spatial adjustments, natural language for semantic changes, and a graphical web interface that allows users to visualize trajectories and modify via-points with drag-and-drop functionality. This system comprises five essential components: energy-based detection of human intentions, a tool-based LLM architecture that chooses and parameterizes safe language adaptation functions, Kernelized Movement Primitives (KMPs) for encoding motion, probabilistic Virtual Fixtures for guided demonstrations, and a web interface. Aimed at non-expert users, this framework enhances the adaptability of industrial robots for diverse tasks and environments, promoting flexible human-robot interaction. The paper can be accessed on arXiv with ID 2604.20468.

Key facts

  • MOMO enables robot skill adaptation via kinesthetic touch, natural language, and a graphical web interface.
  • The framework uses a tool-based LLM architecture that selects and parameterizes predefined functions.
  • It integrates energy-based human-intention detection, KMPs, and probabilistic Virtual Fixtures.
  • The system targets non-expert users for flexible industrial robot applications.
  • The paper was published on arXiv with ID 2604.20468.
  • The approach aims to allow easy adaptation for varying tasks and environments.
  • The web interface supports visualizing geometric relations and trajectories.
  • Natural language is used for high-level semantic modifications.

Entities

Institutions

  • arXiv

Sources