RLDX-1: A New Robotic Policy for Dexterous Manipulation

ai-technology · 2026-05-07

A new robotic policy named RLDX-1 has been developed for versatile dexterous manipulation, utilizing the Multi-Stream Action Transformer (MSAT). This innovative framework merges various modalities through specific streams that employ cross-modal joint self-attention, targeting shortcomings in existing Vision-Language-Action models (VLAs) like motion awareness, long-term memory, and physical sensing capabilities. RLDX-1 enhances MSAT with strategic system-level design elements, including data synthesis for infrequent manipulation situations and tailored learning methods for human-like manipulation. The technical report can be found on arXiv, identified by the number 2605.03269.

Key facts

RLDX-1 is a general-purpose robotic policy for dexterous manipulation.
It is built on the Multi-Stream Action Transformer (MSAT) architecture.
MSAT integrates heterogeneous modalities via modality-specific streams with cross-modal joint self-attention.
RLDX-1 addresses limitations in VLAs: motion awareness, long-term memory, and physical sensing.
System-level design choices include data synthesis for rare manipulation scenarios.
Learning procedures are specialized for human-like manipulation.
The technical report is on arXiv (identifier 2605.03269).

RLDX-1: A New Robotic Policy for Dexterous Manipulation

Key facts

Entities

Institutions

Sources