ARTFEED — Contemporary Art Intelligence

Adaptive Commitment Depth Boosts Long-Horizon Vision-Language Reasoning

other · 2026-05-12

A new method from arXiv (2605.09860) treats commitment depth—the number of actions taken before replanning—as a learnable, state-conditioned variable within a vision-language policy. This adaptive approach outperforms fixed-depth baselines on Sliding Puzzle and Sokoban, achieving up to 12.5 percentage points higher solve rates while using about 25% fewer primitive actions per episode. The system uses a 7B backbone and surpasses GPT-4o in performance.

Key facts

  • Commitment depth is formalized as the number of primitive actions executed open-loop between replans.
  • Most existing long-horizon systems fix commitment depth as a hand-designed scalar.
  • The proposed method treats commitment depth as a learnable, state-conditioned variable of the policy.
  • The adaptive policy is instantiated within a model-native vision-language policy.
  • The method Pareto-dominates every non-degenerate fixed-depth baseline on Sliding Puzzle and Sokoban.
  • It achieves up to 12.5 percentage points higher solve rate.
  • It uses approximately 25% fewer primitive actions per episode.
  • The system uses a 7B backbone and outperforms GPT-4o.

Entities

Institutions

  • arXiv

Sources