CrossVLA Improves VLA Post-Training with Cross-Paradigm DPO
Researchers introduced CrossVLA, a study on cross-paradigm post-training for Vision-Language-Action (VLA) models. They developed a surrogate flow-matching log-probability estimator enabling Direct Preference Optimisation (DPO) on continuous-action backbones without probability-flow ODE integration. Comparing LoRA and DoRA for parameter-efficient VLA DPO, DoRA improved over OpenVLA SFT by a mean +10.4 percentage points across LIBERO 4-suite (600 trials, 3 seeds), with per-suite gains of +20.0 Object, +11.0 Long-horizon, +8.0 Goal, and +2.7 Spatial, and zero seed variance on Object (38/50 on each). The work addresses the gap where DPO had been studied almost exclusively on autoregressive VLAs like OpenVLA, extending it to continuous-action flow-matching models like pi-0.5.
Key facts
- CrossVLA is an empirical study of cross-paradigm VLA post-training.
- A surrogate flow-matching log-probability estimator allows DPO on continuous-action backbones.
- DoRA outperforms LoRA for VLA DPO, with a mean +10.4 pp improvement over OpenVLA SFT.
- Results from 600 trials across LIBERO 4-suite with 3 seeds.
- Per-suite gains: +20.0 Object, +11.0 Long-horizon, +8.0 Goal, +2.7 Spatial.
- Zero seed variance on Object suite (38/50 on each).
- DPO previously studied only on autoregressive VLAs like OpenVLA.
- Continuous-action flow-matching models like pi-0.5 now supported.
Entities
Institutions
- arXiv