ARTFEED — Contemporary Art Intelligence

STAR: Semantic-Temporal Adaptive Learning for Few-Shot Action Recognition

ai-technology · 2026-05-14

A new framework called Semantic Temporal Adaptive Representation Learning (STAR) addresses few-shot action recognition (FSAR) by tackling semantic-temporal misalignment and multi-scale temporal dynamics. STAR integrates a semantic-alignment component with a Temporal Semantic Attention (TSA) mechanism and a temporal-aware component that adapts Mamba's sequence modeling for FSAR. The approach aims to improve generalization to novel action categories from limited annotated samples, overcoming limitations of existing vision-language models that rely on static textual prompts and inadequately model short-term and long-range dependencies. The work is published on arXiv under identifier 2605.13202.

Key facts

  • STAR is a unified framework for few-shot action recognition.
  • It addresses semantic-temporal misalignment in vision-language models.
  • The framework includes a semantic-alignment component with Temporal Semantic Attention (TSA).
  • It adapts Mamba's sequence modeling capability for FSAR.
  • The approach targets multi-scale temporal dynamics including short-term and long-range dependencies.
  • The paper is available on arXiv with ID 2605.13202.
  • The method aims to generalize to novel action categories from few samples.
  • Existing approaches suffer from oversmoothing or fragmentation of temporal cues.

Entities

Institutions

  • arXiv

Sources