PHALAR: New AI Model Boosts Music Stem Retrieval by 70%

ai-technology · 2026-05-07

Researchers have introduced PHALAR, a contrastive learning framework for musical audio representation that achieves up to 70% relative accuracy improvement over state-of-the-art methods in stem retrieval tasks. The model uses a Learned Spectral Pooling layer and a complex-valued head to enforce pitch- and phase-equivariant biases, while requiring less than 50% of the parameters and offering a 7x training speedup. PHALAR sets new retrieval benchmarks across MoisesDB, Slakh, and ChocoChorales datasets, and its outputs correlate significantly higher with human coherence judgment than semantic baselines. Additionally, zero-shot beat tracking and linear chord probing demonstrate that PHALAR captures robust musical structures beyond retrieval. The paper is available on arXiv.

Key facts

PHALAR achieves up to 70% relative accuracy increase over state-of-the-art
Uses Learned Spectral Pooling layer and complex-valued head
Requires less than 50% of parameters and 7x training speedup
New state-of-the-art on MoisesDB, Slakh, and ChocoChorales
Correlates higher with human coherence judgment than semantic baselines
Zero-shot beat tracking and linear chord probing confirm musical structure capture
Published on arXiv under Computer Science > Sound
Contrastive framework for stem retrieval

PHALAR: New AI Model Boosts Music Stem Retrieval by 70%

Key facts

Entities

Institutions

Sources