ARTFEED — Contemporary Art Intelligence

LLM Stories Exhibit Low Diversity, Dominated by 'Lighthouse' Tropes

ai-technology · 2026-05-27

A recent investigation published on arXiv indicates that stories generated by large language models (LLMs) exhibit remarkably low diversity. The study analyzed 20,000 narratives produced by four contemporary models based on five prompts, revealing that 11 specific words were present in 88.3% of the outputs, showing little variation across different models. These frequently occurring terms include names like Elias, Mara, and Elara, as well as settings such as lighthouses and professions like clockmaker and librarian. While these words are uncommon in both published works and pre-training datasets, they are frequently found in preference data likely utilized by existing models. Notably, these 'lighthouse' narratives are less common than the average post-training story, many of which involve copyrighted characters or adult themes. The findings underscore the significant effect of limited datasets paired with robust alignment algorithms on the diversity of generated content.

Key facts

  • 20,000 stories sampled from four current models
  • 11 words occur in 88.3% of generated stories
  • Recurring names: Elias, Mara, Elara
  • Recurring settings: lighthouses
  • Recurring professions: clockmaker, librarian
  • Tokens rare in published literature and pre-training data
  • Tokens found in preference data used by all models
  • Lighthouse stories infrequent compared to average post-training story

Entities

Institutions

  • arXiv

Sources