ARTFEED — Contemporary Art Intelligence

TASTE: A Multimodal Dataset and Benchmark for Music Recommendation

publication · 2026-04-25

A new research paper introduces TASTE, a comprehensive dataset and benchmarking framework designed to improve music recommendation systems by integrating multimodal information, specifically raw audio signals and textual metadata. The paper argues that existing recommendation models rely too heavily on collaborative filtering, which fails to exploit audio characteristics and performs poorly in cold-start scenarios. Current datasets lack rich multimodal information, and evaluation frameworks do not fully leverage multimodal data or support diverse algorithms. TASTE addresses these gaps by providing both audio and textual modalities to highlight their role in music recommendation.

Key facts

  • Paper titled 'Revisiting Content-Based Music Recommendation: Efficient Feature Aggregation from Large-Scale Music Models'
  • Published on arXiv with ID 2604.20847
  • Proposes TASTE dataset and benchmarking framework
  • TASTE integrates audio and textual modalities
  • Existing MRSs rely on collaborative filtering, leading to suboptimal cold-start performance
  • Current datasets lack rich multimodal information
  • Evaluation frameworks do not fully leverage multimodal information
  • TASTE aims to highlight the role of multimodal information in music recommendation

Entities

Institutions

  • arXiv

Sources