PitchBench: New Benchmark Tests Pitch Hearing in Audio-Language Models
Researchers have introduced PitchBench, a benchmark designed to measure pitch hearing in audio-language models (ALMs). As ALMs are increasingly deployed in music-related applications such as tutoring, transcription, captioning, recommendation, and production, reliable musical perception is critical. Existing benchmarks assess pitch hearing only indirectly through higher-level tasks and multiple-choice formats, leaving gaps in evaluating fine-grained pitch identification across instruments, acoustic conditions, and response formats. PitchBench aims to fill this gap by directly probing fundamental pitch perception abilities. The work is published on arXiv under ID 2605.26176.
Key facts
- PitchBench is a new benchmark for measuring pitch hearing in audio-language models.
- Audio-language models are used in music tutoring, transcription, captioning, recommendation, and production.
- Existing benchmarks assess pitch hearing indirectly through higher-level tasks.
- Current evaluations often use multiple-choice formats and do not test fine-grained pitch identification.
- PitchBench probes pitch hearing across instruments, acoustic conditions, and response formats.
- The research is published on arXiv with ID 2605.26176.
- Reliable musical perception is critical for ALMs in real-world applications.
- ALMs are becoming important components of multimodal AI systems.
Entities
Institutions
- arXiv