Libra-VLA: Coarse-to-Fine Dual-System for Robotic Manipulation

other · 2026-04-30

A new research paper introduces Libra-VLA, a Vision-Language-Action (VLA) model designed for generalist robotic manipulation. The model addresses the limitations of monolithic generation paradigms by explicitly decoupling learning complexity into a coarse-to-fine hierarchy. This approach models complex actions in a Hybrid Action Space, decomposing them into discrete macro-directional reaching and continuous micro-pose alignment. The architecture aims to bridge the semantic-actuation gap and reduce the representational burden of grounding high-level semantics to continuous actions. The paper is published on arXiv under the identifier 2604.24921.

Key facts

Libra-VLA is a novel Coarse-to-Fine Dual-System VLA architecture.
It addresses the monolithic generation paradigm in robotic manipulation.
The model uses a Hybrid Action Space with discrete and continuous components.
It decomposes actions into macro-directional reaching and micro-pose alignment.
The goal is to bridge the semantic-actuation gap.
The paper is available on arXiv with ID 2604.24921.
The approach aims to reduce representational burden.
It focuses on grounding high-level semantics into continuous actions.

Entities

—

Sources

arXiv cs.AI — 2026-04-29