DEFLECT: Delay-Robust Execution for VLA Policies via Flow-Matching Likelihood-Estimated Counterfactual Tuning
A novel technique known as DEFLECT tackles the issue of prediction-execution misalignment in Vision-Language-Action (VLA) policies that utilize asynchronous inference. In these systems, while the robot carries out a previously anticipated action chunk, the model is simultaneously calculating the next one, which results in the action being based on outdated observations. This misalignment can lead to severe failures: naive asynchronous rollover performance plummets from 89% to below 1% on Kinetix when the inference cycle spans up to seven control steps. DEFLECT operates as a completely offline post-training enhancement that transforms latency into a label-free preference signal, generating fresh/stale action pairs from a static reference policy and evaluating them with an implicit flow-matching likelihood-ratio surrogate, eliminating the need for human labels or reward models. This method can be easily integrated into existing async-VLA systems.
Key facts
- DEFLECT addresses prediction-execution misalignment in VLA policies
- Naive asynchronous rollover collapses from 89% to under 1% on Kinetix
- Inference cycle covers up to seven control steps
- DEFLECT is a fully offline post-training refinement
- Converts latency into a label-free preference signal
- Constructs counterfactual fresh/stale action pairs from a frozen reference policy
- Scores pairs using implicit flow-matching likelihood-ratio surrogate
- No human labels or reward models required
- Applies as near drop-in upgrade to existing async-VLA stacks
Entities
—