PHALAR: New AI Model Boosts Music Stem Retrieval by 70%
Researchers have introduced PHALAR, a contrastive learning framework for musical audio representation that achieves up to 70% relative accuracy improvement over state-of-the-art methods in stem retrieval tasks. The model uses a Learned Spectral Pooling layer and a complex-valued head to enforce pitch- and phase-equivariant biases, while requiring less than 50% of the parameters and offering a 7x training speedup. PHALAR sets new retrieval benchmarks across MoisesDB, Slakh, and ChocoChorales datasets, and its outputs correlate significantly higher with human coherence judgment than semantic baselines. Additionally, zero-shot beat tracking and linear chord probing demonstrate that PHALAR captures robust musical structures beyond retrieval. The paper is available on arXiv.
Key facts
- PHALAR achieves up to 70% relative accuracy increase over state-of-the-art
- Uses Learned Spectral Pooling layer and complex-valued head
- Requires less than 50% of parameters and 7x training speedup
- New state-of-the-art on MoisesDB, Slakh, and ChocoChorales
- Correlates higher with human coherence judgment than semantic baselines
- Zero-shot beat tracking and linear chord probing confirm musical structure capture
- Published on arXiv under Computer Science > Sound
- Contrastive framework for stem retrieval
Entities
Institutions
- arXiv