ARTFEED — Contemporary Art Intelligence

Base Models Fool AI Detectors, Appearing More Human Than Fine-Tuned Versions

ai-technology · 2026-05-20

A recent study published on arXiv (2605.19516) indicates that commercial AI text detectors, such as GPTZero and Pangram, are more inclined to identify text generated by base language models as human-written compared to that from instruction-tuned models. The authors introduce a method called Humanization by Iterative Paraphrasing (HIP), which refines a base model into a paraphraser and applies it repeatedly to avoid detection while maintaining the original meaning. This approach shows consistent enhancements in detector human-likeness across the Llama-3 and Qwen-3 model families, which range from 0.6B to 70B parameters. The results imply that existing detectors focus on artifacts unique to instruction-tuned outputs, allowing base model text to go unnoticed.

Key facts

  • Study published on arXiv with ID 2605.19516
  • Evaluated GPTZero and Pangram detectors
  • Base models appear more human than instruction-tuned models
  • Proposed HIP pipeline fine-tunes base model as paraphraser
  • Tested on Llama-3 and Qwen-3 families from 0.6B to 70B parameters
  • HIP improves trade-off between semantic preservation and detector evasion
  • Detectors may track artifacts of instruction tuning
  • Implications for academic integrity workflows

Entities

Institutions

  • arXiv
  • GPTZero
  • Pangram
  • Llama-3
  • Qwen-3

Sources