ARTFEED — Contemporary Art Intelligence

ConTrans: New Architecture for Zero-Shot Action Localization

other · 2026-06-01

A research paper introduces ConTrans, a novel multi-scale encoder architecture for Zero-shot Temporal Action Localization (ZS-TAL). The method integrates convolutional inductive biases with transformer self-attention to capture both fine-grained local dependencies and long-range global context, addressing limitations of existing approaches that neglect relative-offset-based local correlations and suffer from shallow network architectures. Experimental evaluations on ActivityNet-1.3 and THUMOS datasets demonstrate improved feature representations.

Key facts

  • ConTrans integrates convolutional inductive biases with transformer self-attention.
  • It captures fine-grained local dependencies and long-range global context.
  • Addresses limitations of existing ZS-TAL methods that neglect local correlations.
  • Evaluated on ActivityNet-1.3 and THUMOS datasets.
  • Aims to detect and locate unseen actions in untrimmed videos.

Entities

Sources