ARTFEED — Contemporary Art Intelligence

CrossVLA Improves VLA Post-Training with Cross-Paradigm DPO

ai-technology · 2026-05-23

Researchers introduced CrossVLA, a study on cross-paradigm post-training for Vision-Language-Action (VLA) models. They developed a surrogate flow-matching log-probability estimator enabling Direct Preference Optimisation (DPO) on continuous-action backbones without probability-flow ODE integration. Comparing LoRA and DoRA for parameter-efficient VLA DPO, DoRA improved over OpenVLA SFT by a mean +10.4 percentage points across LIBERO 4-suite (600 trials, 3 seeds), with per-suite gains of +20.0 Object, +11.0 Long-horizon, +8.0 Goal, and +2.7 Spatial, and zero seed variance on Object (38/50 on each). The work addresses the gap where DPO had been studied almost exclusively on autoregressive VLAs like OpenVLA, extending it to continuous-action flow-matching models like pi-0.5.

Key facts

  • CrossVLA is an empirical study of cross-paradigm VLA post-training.
  • A surrogate flow-matching log-probability estimator allows DPO on continuous-action backbones.
  • DoRA outperforms LoRA for VLA DPO, with a mean +10.4 pp improvement over OpenVLA SFT.
  • Results from 600 trials across LIBERO 4-suite with 3 seeds.
  • Per-suite gains: +20.0 Object, +11.0 Long-horizon, +8.0 Goal, +2.7 Spatial.
  • Zero seed variance on Object suite (38/50 on each).
  • DPO previously studied only on autoregressive VLAs like OpenVLA.
  • Continuous-action flow-matching models like pi-0.5 now supported.

Entities

Institutions

  • arXiv

Sources