TASTE: A Multimodal Dataset and Benchmark for Music Recommendation

publication · 2026-04-25

A new research paper introduces TASTE, a comprehensive dataset and benchmarking framework designed to improve music recommendation systems by integrating multimodal information, specifically raw audio signals and textual metadata. The paper argues that existing recommendation models rely too heavily on collaborative filtering, which fails to exploit audio characteristics and performs poorly in cold-start scenarios. Current datasets lack rich multimodal information, and evaluation frameworks do not fully leverage multimodal data or support diverse algorithms. TASTE addresses these gaps by providing both audio and textual modalities to highlight their role in music recommendation.

Key facts

Paper titled 'Revisiting Content-Based Music Recommendation: Efficient Feature Aggregation from Large-Scale Music Models'
Published on arXiv with ID 2604.20847
Proposes TASTE dataset and benchmarking framework
TASTE integrates audio and textual modalities
Existing MRSs rely on collaborative filtering, leading to suboptimal cold-start performance
Current datasets lack rich multimodal information
Evaluation frameworks do not fully leverage multimodal information
TASTE aims to highlight the role of multimodal information in music recommendation

TASTE: A Multimodal Dataset and Benchmark for Music Recommendation

Key facts

Entities

Institutions

Sources