Base Models Fool AI Detectors, Appearing More Human Than Fine-Tuned Versions

ai-technology · 2026-05-20

A recent study published on arXiv (2605.19516) indicates that commercial AI text detectors, such as GPTZero and Pangram, are more inclined to identify text generated by base language models as human-written compared to that from instruction-tuned models. The authors introduce a method called Humanization by Iterative Paraphrasing (HIP), which refines a base model into a paraphraser and applies it repeatedly to avoid detection while maintaining the original meaning. This approach shows consistent enhancements in detector human-likeness across the Llama-3 and Qwen-3 model families, which range from 0.6B to 70B parameters. The results imply that existing detectors focus on artifacts unique to instruction-tuned outputs, allowing base model text to go unnoticed.

Key facts

Study published on arXiv with ID 2605.19516
Evaluated GPTZero and Pangram detectors
Base models appear more human than instruction-tuned models
Proposed HIP pipeline fine-tunes base model as paraphraser
Tested on Llama-3 and Qwen-3 families from 0.6B to 70B parameters
HIP improves trade-off between semantic preservation and detector evasion
Detectors may track artifacts of instruction tuning
Implications for academic integrity workflows

Entities

Institutions

arXiv
GPTZero
Pangram
Llama-3
Qwen-3

Sources

arXiv cs.AI — 2026-05-20