ARTFEED — Contemporary Art Intelligence

New Metric Measures Lexical Diversity Loss in LLM Decoding

ai-technology · 2026-05-27

Researchers have introduced the Word Coverage Score (WCS), a metric that quantifies how standard sampling filters like Top-p, Top-k, and Min-p suppress low-frequency, high-information words in large language models (LLMs). The study, published on arXiv (2605.27268), audits open-weight models on human-authored corpus fragments to measure lexical survival rates. Findings provide quantitative evidence that decoding mechanics, rather than model knowledge alone, contribute to repetitive and homogeneous text generation. The WCS assesses which contextually appropriate human words become unreachable due to mathematical pruning, even when they exist in the probability space.

Key facts

  • Word Coverage Score (WCS) introduced as a metric
  • Measures lexical survival rate of low-frequency words
  • Audits open-weight models on human-authored corpus
  • Focuses on decoding mechanics (Top-p, Top-k, Min-p)
  • Published on arXiv with ID 2605.27268
  • Addresses criticism of LLM repetitive text
  • Quantifies suppression of linguistic diversity
  • Shows words unreachable despite being in probability space

Entities

Institutions

  • arXiv

Sources