MAEPose: Self-Supervised Human Pose Estimation from mmWave Video

ai-technology · 2026-05-04

Researchers have developed MAEPose, a novel approach for human pose estimation using millimetre-wave (mmWave) radar video, which offers a privacy-preserving alternative to RGB-based methods. Unlike existing techniques that rely on pre-extracted intermediate representations like sparse point clouds or spectrogram images—discarding rich spatiotemporal information and adding system complexity—MAEPose operates directly on raw mmWave spectrogram videos. It employs masked autoencoding to learn spatiotemporal motion-aware generalized representations from unlabelled radar video, using a heatmap decoder for multi-frame pose estimation predictions. The method was evaluated across three datasets, demonstrating its effectiveness. This work addresses the limitations of supervised end-to-end approaches by leveraging unlabelled data to learn robust representations, potentially advancing privacy-conscious applications in healthcare, surveillance, and human-computer interaction.

Key facts

MAEPose uses mmWave radar video for human pose estimation.
It operates directly on spectrogram videos, avoiding pre-extracted representations.
The method is self-supervised via masked autoencoding.
It learns spatiotemporal motion-aware representations from unlabelled data.
A heatmap decoder enables multi-frame pose estimation.
Evaluated across three datasets.
Offers a privacy-preserving alternative to RGB-based methods.
Reduces system complexity compared to existing approaches.

Entities

—

Sources

arXiv cs.AI — 2026-05-04