ARTFEED — Contemporary Art Intelligence

Open ASR Leaderboard Adds Private Datasets to Combat Benchmaxxing

other · 2026-05-06

The Open ASR Leaderboard, launched in September 2023 and visited over 710K times, has introduced private datasets from Appen Inc. and DataoceanAI to mitigate benchmaxxing and test-set contamination. These high-quality English ASR datasets cover scripted and conversational speech across multiple accents. To prevent exploitation, the datasets are kept private, and the leaderboard's default Average WER remains computed on public datasets only, with an optional toggle to include private data. The leaderboard emphasizes standardization through a normalizer based on Whisper that removes punctuation and casing and maps to American spelling, and openness via open-sourced UI code and evaluation scripts. New metrics include macroaverages for scripted, conversational, US, and non-US accents, without providing per-split scores to discourage optimization. Model developers can submit models via GitHub pull requests, with results verified on public sets and private metrics computed separately. The initiative aims to provide a more holistic view of ASR performance, acknowledging that no single model excels across all dimensions. Future plans include evaluations reflecting real-world noisy conditions.

Key facts

  • Open ASR Leaderboard launched September 2023.
  • Over 710K visits since launch.
  • Private datasets from Appen Inc. and DataoceanAI added.
  • Datasets cover scripted and conversational speech, multiple accents.
  • Private datasets kept to prevent benchmaxxing.
  • Default Average WER uses public datasets only.
  • Toggle option to include private datasets.
  • Normalizer based on Whisper removes punctuation/casing, maps to American spelling.
  • UI code and evaluation scripts open-sourced.
  • New metrics: Avg Scripted, Avg Conversational, Avg US, Avg non-US.
  • No per-split scores provided to avoid optimization.
  • Model submission via GitHub pull requests.
  • Results verified on public sets, private metrics computed separately.
  • Future evaluations for real-world noisy conditions.

Entities

Institutions

  • Open ASR Leaderboard
  • Hugging Face
  • Appen Inc.
  • DataoceanAI
  • GitHub
  • Whisper

Sources