STAR: Semantic-Temporal Adaptive Learning for Few-Shot Action Recognition
A new framework called Semantic Temporal Adaptive Representation Learning (STAR) addresses few-shot action recognition (FSAR) by tackling semantic-temporal misalignment and multi-scale temporal dynamics. STAR integrates a semantic-alignment component with a Temporal Semantic Attention (TSA) mechanism and a temporal-aware component that adapts Mamba's sequence modeling for FSAR. The approach aims to improve generalization to novel action categories from limited annotated samples, overcoming limitations of existing vision-language models that rely on static textual prompts and inadequately model short-term and long-range dependencies. The work is published on arXiv under identifier 2605.13202.
Key facts
- STAR is a unified framework for few-shot action recognition.
- It addresses semantic-temporal misalignment in vision-language models.
- The framework includes a semantic-alignment component with Temporal Semantic Attention (TSA).
- It adapts Mamba's sequence modeling capability for FSAR.
- The approach targets multi-scale temporal dynamics including short-term and long-range dependencies.
- The paper is available on arXiv with ID 2605.13202.
- The method aims to generalize to novel action categories from few samples.
- Existing approaches suffer from oversmoothing or fragmentation of temporal cues.
Entities
Institutions
- arXiv