ARTFEED — Contemporary Art Intelligence

PHALAR: New AI Model Boosts Music Stem Retrieval by 70%

ai-technology · 2026-05-07

Researchers have introduced PHALAR, a contrastive learning framework for musical audio representation that achieves up to 70% relative accuracy improvement over state-of-the-art methods in stem retrieval tasks. The model uses a Learned Spectral Pooling layer and a complex-valued head to enforce pitch- and phase-equivariant biases, while requiring less than 50% of the parameters and offering a 7x training speedup. PHALAR sets new retrieval benchmarks across MoisesDB, Slakh, and ChocoChorales datasets, and its outputs correlate significantly higher with human coherence judgment than semantic baselines. Additionally, zero-shot beat tracking and linear chord probing demonstrate that PHALAR captures robust musical structures beyond retrieval. The paper is available on arXiv.

Key facts

  • PHALAR achieves up to 70% relative accuracy increase over state-of-the-art
  • Uses Learned Spectral Pooling layer and complex-valued head
  • Requires less than 50% of parameters and 7x training speedup
  • New state-of-the-art on MoisesDB, Slakh, and ChocoChorales
  • Correlates higher with human coherence judgment than semantic baselines
  • Zero-shot beat tracking and linear chord probing confirm musical structure capture
  • Published on arXiv under Computer Science > Sound
  • Contrastive framework for stem retrieval

Entities

Institutions

  • arXiv

Sources