Simpson's Paradox in Machine-Generated Text Detection

other · 2026-05-09

A new arXiv paper (2605.06294) reveals that the dominant method for distinguishing human-written from machine-generated text suffers from Simpson's paradox. The likelihood hypothesis, which assumes machine text is more probable to a detector model, fails because token-level signals are non-uniform across the model's hidden space. Naive averaging of likelihood scores across regions with different statistical structure destroys local signals. The authors propose a learned local calibration step grounded in Bayesian decision theory, using lightweight predictors of score distributions conditioned on position to correct aggregation errors.

Key facts

arXiv paper 2605.06294 addresses detection of machine-generated text.
Dominant approach uses likelihood hypothesis: machine text appears more probable.
Token-level signal is non-uniform across hidden space of detector model.
Naive averaging causes Simpson's paradox, destroying strong local signals.
Proposed solution: learned local calibration step based on Bayesian decision theory.
Calibration uses lightweight predictors of score distributions conditioned on position.
Paper demonstrates that inappropriate aggregation is a key flaw in current detectors.
Research is of profound societal importance for distinguishing human vs. AI text.

Entities

—

Sources

arXiv cs.AI — 2026-05-09