Libra-VLA: Coarse-to-Fine Dual-System for Robotic Manipulation
A new research paper introduces Libra-VLA, a Vision-Language-Action (VLA) model designed for generalist robotic manipulation. The model addresses the limitations of monolithic generation paradigms by explicitly decoupling learning complexity into a coarse-to-fine hierarchy. This approach models complex actions in a Hybrid Action Space, decomposing them into discrete macro-directional reaching and continuous micro-pose alignment. The architecture aims to bridge the semantic-actuation gap and reduce the representational burden of grounding high-level semantics to continuous actions. The paper is published on arXiv under the identifier 2604.24921.
Key facts
- Libra-VLA is a novel Coarse-to-Fine Dual-System VLA architecture.
- It addresses the monolithic generation paradigm in robotic manipulation.
- The model uses a Hybrid Action Space with discrete and continuous components.
- It decomposes actions into macro-directional reaching and micro-pose alignment.
- The goal is to bridge the semantic-actuation gap.
- The paper is available on arXiv with ID 2604.24921.
- The approach aims to reduce representational burden.
- It focuses on grounding high-level semantics into continuous actions.
Entities
—