ARTFEED — Contemporary Art Intelligence

DeGenTWeb Reveals Widespread LLM-Generated Content on Websites

ai-technology · 2026-05-04

A new research paper, DeGenTWeb, presents a systematic method to identify websites dominated by content generated by large language models (LLMs) with minimal human input. The authors argue that previous claims about LLM content prevalence were based on non-representative samples and opaque methodologies, and that LLM text detectors perform poorly when minimizing false attributions of human text. DeGenTWeb adapts detectors for web pages and aggregates results across multiple pages for accurate site-level categorization. The study finds that LLM-dominant sites are highly prevalent, though specific numbers are not provided in the abstract. The paper is available on arXiv under identifier 2605.00087.

Key facts

  • DeGenTWeb systematically identifies LLM-dominant websites
  • LLM-dominant sites have content generated by LLMs with little human input
  • Previous claims about LLM content prevalence lacked representative samples
  • LLM text detectors perform worse than advertised when minimizing false positives
  • DeGenTWeb adapts detectors for web pages and aggregates results across pages
  • LLM-dominant sites are found to be highly prevalent
  • Paper available on arXiv: 2605.00087
  • Methodology aims for accurate site-level categorization

Entities

Institutions

  • arXiv

Sources