ARTFEED — Contemporary Art Intelligence

Weakly-Supervised Video Grounding as a Game

other · 2026-05-27

Researchers propose a game-theoretic approach to weakly-supervised video temporal grounding, addressing limitations in existing methods. Current frameworks rely on moment proposal selection with contrastive learning and reconstruction, but overlook coarse-grained cross-modal learning and complex proposal dependencies. The new method models fine-grained video-frame-to-query-word alignment and eliminates the need for predefined proposals. This is the first attempt to frame the task as a game, improving grounding accuracy without costly proposal generation.

Key facts

  • Task: weakly-supervised video temporal grounding
  • Existing methods use moment proposal selection with contrastive learning and reconstruction
  • Two issues identified: coarse-grained cross-modal learning and complex moment proposals
  • Proposed method: game perspective for the first time
  • Aims to capture detailed consistency between video frames and query words
  • Eliminates reliance on predefined moment proposals
  • Source: arXiv preprint 2605.26441
  • Published on arXiv

Entities

Institutions

  • arXiv

Sources