RLDX-1: A New Robotic Policy for Dexterous Manipulation
A new robotic policy named RLDX-1 has been developed for versatile dexterous manipulation, utilizing the Multi-Stream Action Transformer (MSAT). This innovative framework merges various modalities through specific streams that employ cross-modal joint self-attention, targeting shortcomings in existing Vision-Language-Action models (VLAs) like motion awareness, long-term memory, and physical sensing capabilities. RLDX-1 enhances MSAT with strategic system-level design elements, including data synthesis for infrequent manipulation situations and tailored learning methods for human-like manipulation. The technical report can be found on arXiv, identified by the number 2605.03269.
Key facts
- RLDX-1 is a general-purpose robotic policy for dexterous manipulation.
- It is built on the Multi-Stream Action Transformer (MSAT) architecture.
- MSAT integrates heterogeneous modalities via modality-specific streams with cross-modal joint self-attention.
- RLDX-1 addresses limitations in VLAs: motion awareness, long-term memory, and physical sensing.
- System-level design choices include data synthesis for rare manipulation scenarios.
- Learning procedures are specialized for human-like manipulation.
- The technical report is on arXiv (identifier 2605.03269).
Entities
Institutions
- arXiv