Adaptive Commitment Depth Boosts Long-Horizon Vision-Language Reasoning

other · 2026-05-12

A new method from arXiv (2605.09860) treats commitment depth—the number of actions taken before replanning—as a learnable, state-conditioned variable within a vision-language policy. This adaptive approach outperforms fixed-depth baselines on Sliding Puzzle and Sokoban, achieving up to 12.5 percentage points higher solve rates while using about 25% fewer primitive actions per episode. The system uses a 7B backbone and surpasses GPT-4o in performance.

Key facts

Commitment depth is formalized as the number of primitive actions executed open-loop between replans.
Most existing long-horizon systems fix commitment depth as a hand-designed scalar.
The proposed method treats commitment depth as a learnable, state-conditioned variable of the policy.
The adaptive policy is instantiated within a model-native vision-language policy.
The method Pareto-dominates every non-degenerate fixed-depth baseline on Sliding Puzzle and Sokoban.
It achieves up to 12.5 percentage points higher solve rate.
It uses approximately 25% fewer primitive actions per episode.
The system uses a 7B backbone and outperforms GPT-4o.

Adaptive Commitment Depth Boosts Long-Horizon Vision-Language Reasoning

Key facts

Entities

Institutions

Sources