STAR: Semantic-Temporal Adaptive Learning for Few-Shot Action Recognition

ai-technology · 2026-05-14

A new framework called Semantic Temporal Adaptive Representation Learning (STAR) addresses few-shot action recognition (FSAR) by tackling semantic-temporal misalignment and multi-scale temporal dynamics. STAR integrates a semantic-alignment component with a Temporal Semantic Attention (TSA) mechanism and a temporal-aware component that adapts Mamba's sequence modeling for FSAR. The approach aims to improve generalization to novel action categories from limited annotated samples, overcoming limitations of existing vision-language models that rely on static textual prompts and inadequately model short-term and long-range dependencies. The work is published on arXiv under identifier 2605.13202.

Key facts

STAR is a unified framework for few-shot action recognition.
It addresses semantic-temporal misalignment in vision-language models.
The framework includes a semantic-alignment component with Temporal Semantic Attention (TSA).
It adapts Mamba's sequence modeling capability for FSAR.
The approach targets multi-scale temporal dynamics including short-term and long-range dependencies.
The paper is available on arXiv with ID 2605.13202.
The method aims to generalize to novel action categories from few samples.
Existing approaches suffer from oversmoothing or fragmentation of temporal cues.

STAR: Semantic-Temporal Adaptive Learning for Few-Shot Action Recognition

Key facts

Entities

Institutions

Sources