DEFLECT: Delay-Robust Execution for VLA Policies via Flow-Matching Likelihood-Estimated Counterfactual Tuning

ai-technology · 2026-05-20

A novel technique known as DEFLECT tackles the issue of prediction-execution misalignment in Vision-Language-Action (VLA) policies that utilize asynchronous inference. In these systems, while the robot carries out a previously anticipated action chunk, the model is simultaneously calculating the next one, which results in the action being based on outdated observations. This misalignment can lead to severe failures: naive asynchronous rollover performance plummets from 89% to below 1% on Kinetix when the inference cycle spans up to seven control steps. DEFLECT operates as a completely offline post-training enhancement that transforms latency into a label-free preference signal, generating fresh/stale action pairs from a static reference policy and evaluating them with an implicit flow-matching likelihood-ratio surrogate, eliminating the need for human labels or reward models. This method can be easily integrated into existing async-VLA systems.

Key facts

DEFLECT addresses prediction-execution misalignment in VLA policies
Naive asynchronous rollover collapses from 89% to under 1% on Kinetix
Inference cycle covers up to seven control steps
DEFLECT is a fully offline post-training refinement
Converts latency into a label-free preference signal
Constructs counterfactual fresh/stale action pairs from a frozen reference policy
Scores pairs using implicit flow-matching likelihood-ratio surrogate
No human labels or reward models required
Applies as near drop-in upgrade to existing async-VLA stacks

Entities

—

Sources

arXiv cs.AI — 2026-05-20