xMAE: Masked Cross-Modal Reconstruction for Biosignal Learning

other · 2026-05-06

Researchers introduce xMAE, a self-supervised pretraining framework for biosignal representation learning. It leverages masked cross-modal reconstruction across temporally ordered biosignals, such as ECG and PPG, to capture directional temporal dynamics. The method outperforms unimodal and multimodal baselines by enforcing physiologically meaningful timing structure.

Key facts

xMAE uses masked cross-modal reconstruction for biosignal pretraining.
It models directional temporal dynamics between ECG and PPG signals.
ECG captures electrical activation of heartbeat; PPG records peripheral pulse delayed by vascular dynamics.
The framework encourages physiologically meaningful timing structure in learned representations.
Pretraining with xMAE outperforms unimodal and multimodal approaches.
The paper is available on arXiv with ID 2605.00973.
The method treats biosignals as temporally ordered views of the same physiological process.
Existing self-supervised methods often overlook directional temporal dynamics.

xMAE: Masked Cross-Modal Reconstruction for Biosignal Learning

Key facts

Entities

Institutions

Sources