XAttnMark: Cross-Attention Audio Watermarking for Generative AI
Researchers have introduced XAttnMark (Cross-Attention Robust Audio Watermark), a neural network-based method to embed imperceptible watermarks in audio, addressing copyright and deepfake concerns. The system uses partial parameter sharing between generator and detector, a cross-attention mechanism for message retrieval, and a temporal conditioning module. A psychoacoustic-aligned time-frequency masking loss enhances imperceptibility. The method aims to jointly optimize robust detection and accurate attribution, overcoming limitations of prior techniques like WavMark and AudioSeal.
Key facts
- XAttnMark stands for Cross-Attention Robust Audio Watermark.
- It is introduced in arXiv paper 2502.04230.
- The method targets copyright infringement and deepfake audio.
- It uses partial parameter sharing between generator and detector.
- A cross-attention mechanism enables efficient message retrieval.
- A temporal conditioning module improves message distribution.
- A psychoacoustic-aligned TF masking loss captures frequency masking.
- Prior methods include WavMark and AudioSeal.
Entities
Institutions
- arXiv