D-VLA: Distributed RL Framework for Embodied AI Models

other · 2026-05-14

Researchers propose D-VLA, a high-concurrency distributed reinforcement learning framework for Vision-Language-Action (VLA) models in Embodied AI. The framework addresses systemic bottlenecks from resource conflicts between high-fidelity physical simulation and deep learning's VRAM/bandwidth demands. D-VLA introduces 'Plane Decoupling' to physically isolate high-frequency training data from low-frequency weight control, eliminating interference between simulation and optimization. A four-thread asynchronous 'Swimlane' pipeline enables full parallel overlap of sampling and training. The work is detailed in arXiv preprint 2605.13276.

Key facts

D-VLA is a distributed RL framework for VLA models
Addresses bottlenecks from simulation and deep learning resource conflicts
Introduces Plane Decoupling to isolate training data and weight control
Uses a four-thread asynchronous Swimlane pipeline
Published as arXiv:2605.13276
Focuses on large-scale embodied foundation models
Aims for high concurrency and low latency
Targets Embodied AI applications

D-VLA: Distributed RL Framework for Embodied AI Models

Key facts

Entities

Institutions

Sources