Sword Framework Enhances World Model Robustness for VLA Policy Training

ai-technology · 2026-05-11

Researchers propose Sword, a robust World Model framework to address poor generalization and long-horizon error accumulation in Vision-Language-Action (VLA) models used as generative simulators. Existing World Models, when deployed on benchmarks like LIBERO, suffer from sensitivity to initial-state perturbations such as color and illumination changes, leading to cascading hallucinations and degraded future state predictions. Sword introduces dynamic latent bootstrapping to mitigate these issues, improving reliability for policy optimization entirely within imagination. The method targets post-training of VLA policies, enhancing simulator fidelity without requiring real-world interaction. The paper is available on arXiv under ID 2605.07288.

Key facts

Sword is a robust World Model framework for VLA policy post-training.
Existing World Models on LIBERO benchmark show poor generalization and long-horizon error accumulation.
Minor visual perturbations cause cascading hallucinations in closed-loop rollouts.
Sword uses dynamic latent bootstrapping to improve simulator reliability.
The method enables policy optimization entirely within imagination.
The paper is published on arXiv with ID 2605.07288.
The approach targets post-training of Vision-Language-Action models.
Sword addresses sensitivity to color and illumination changes.

Sword Framework Enhances World Model Robustness for VLA Policy Training

Key facts

Entities

Institutions

Sources