Simpson's Paradox in Machine-Generated Text Detection
A new arXiv paper (2605.06294) reveals that the dominant method for distinguishing human-written from machine-generated text suffers from Simpson's paradox. The likelihood hypothesis, which assumes machine text is more probable to a detector model, fails because token-level signals are non-uniform across the model's hidden space. Naive averaging of likelihood scores across regions with different statistical structure destroys local signals. The authors propose a learned local calibration step grounded in Bayesian decision theory, using lightweight predictors of score distributions conditioned on position to correct aggregation errors.
Key facts
- arXiv paper 2605.06294 addresses detection of machine-generated text.
- Dominant approach uses likelihood hypothesis: machine text appears more probable.
- Token-level signal is non-uniform across hidden space of detector model.
- Naive averaging causes Simpson's paradox, destroying strong local signals.
- Proposed solution: learned local calibration step based on Bayesian decision theory.
- Calibration uses lightweight predictors of score distributions conditioned on position.
- Paper demonstrates that inappropriate aggregation is a key flaw in current detectors.
- Research is of profound societal importance for distinguishing human vs. AI text.
Entities
—