LLM Benchmark Datasets Must Be Contamination-Resistant

publication · 2026-05-20

A new paper on arXiv (2605.19999) argues that LLM benchmark datasets should be designed to be contamination-resistant, meaning they are unlearnable by models but still support inference. The authors highlight widespread contamination of current benchmarks in pretraining corpora, which undermines their reliability. They propose leveraging the asymmetry between inference and training pipelines in Transformer architectures to achieve this, and call for mathematical advancements to ensure interoperability across LLM architectures.

Key facts

Paper argues benchmark datasets should be contamination-resistant (unlearnable but support inference).
Current benchmarks are often contaminated by inclusion in pretraining corpora.
Contamination diminishes benchmark value for measuring model generalization.
Authors propose using asymmetry between inference and training pipelines in Transformers.
Mathematical advancements needed for cross-architecture interoperability.
Paper is a call to action for the research community.
arXiv ID: 2605.19999.
Published as arXiv preprint.

LLM Benchmark Datasets Must Be Contamination-Resistant

Key facts

Entities

Institutions

Sources