LLM Backbones Enable Efficient Adaptive Human Activity Recognition
A recent study suggests leveraging large pretrained language models (LLMs) as universal temporal frameworks for sensor-driven human activity recognition (HAR), rather than developing task-specific Transformer models from the ground up. This method incorporates a structured convolutional projection to translate multivariate accelerometer and gyroscope data into the latent space of the LLM, effectively connecting the differences between inertial time series and language. By keeping the pretrained backbone static, this strategy lowers training expenses and data needs while enhancing adaptability to changes in the domain. The research can be found on arXiv under ID 2605.12019.
Key facts
- The paper proposes reusing LLMs as generic temporal backbones for HAR.
- A structured convolutional projection maps inertial signals to LLM latent space.
- The pretrained backbone is kept frozen, reducing training cost.
- The approach aims to improve adaptability to domain shifts.
- The paper is published on arXiv with ID 2605.12019.
- The method uses accelerometer and gyroscope data.
- It avoids training task-specific Transformers from scratch.
- The paradigm shift addresses high training cost and data requirements.
Entities
Institutions
- arXiv