ARTFEED — Contemporary Art Intelligence

Libra-VLA: Coarse-to-Fine Dual-System for Robotic Manipulation

other · 2026-04-30

A new research paper introduces Libra-VLA, a Vision-Language-Action (VLA) model designed for generalist robotic manipulation. The model addresses the limitations of monolithic generation paradigms by explicitly decoupling learning complexity into a coarse-to-fine hierarchy. This approach models complex actions in a Hybrid Action Space, decomposing them into discrete macro-directional reaching and continuous micro-pose alignment. The architecture aims to bridge the semantic-actuation gap and reduce the representational burden of grounding high-level semantics to continuous actions. The paper is published on arXiv under the identifier 2604.24921.

Key facts

  • Libra-VLA is a novel Coarse-to-Fine Dual-System VLA architecture.
  • It addresses the monolithic generation paradigm in robotic manipulation.
  • The model uses a Hybrid Action Space with discrete and continuous components.
  • It decomposes actions into macro-directional reaching and micro-pose alignment.
  • The goal is to bridge the semantic-actuation gap.
  • The paper is available on arXiv with ID 2604.24921.
  • The approach aims to reduce representational burden.
  • It focuses on grounding high-level semantics into continuous actions.

Entities

Sources