ARTFEED — Contemporary Art Intelligence

ONOTE Benchmark Targets Omnimodal Music Notation Processing

other · 2026-04-24

Researchers have introduced ONOTE, a multi-format benchmark designed to evaluate omnimodal notation processing (ONP) in AI systems. ONP requires alignment across auditory, visual, and symbolic domains, but current research is fragmented and biased toward Western staff notation. Existing metrics, including 'LLM-as-a-judge,' suffer from hallucinations and fail to assess structural reasoning. ONOTE uses a deterministic pipeline based on canonical pitch projection to eliminate subjective scoring biases across diverse notation systems. Evaluation of leading omnimodal models reveals a disconnect between perceptual accuracy and music-theoretic understanding.

Key facts

  • ONOTE is a multi-format benchmark for omnimodal notation processing.
  • Current research on ONP is fragmented and biased toward Western staff notation.
  • LLM-as-a-judge metrics are unreliable due to systemic hallucinations.
  • ONOTE uses a deterministic pipeline grounded in canonical pitch projection.
  • The benchmark eliminates subjective scoring biases across diverse notation systems.
  • Evaluation shows a disconnect between perceptual accuracy and music-theoretic understanding.
  • The paper is published on arXiv with ID 2604.20719.
  • ONP requires alignment across auditory, visual, and symbolic domains.

Entities

Institutions

  • arXiv

Sources