LLM Stories Exhibit Low Diversity, Dominated by 'Lighthouse' Tropes

ai-technology · 2026-05-27

A recent investigation published on arXiv indicates that stories generated by large language models (LLMs) exhibit remarkably low diversity. The study analyzed 20,000 narratives produced by four contemporary models based on five prompts, revealing that 11 specific words were present in 88.3% of the outputs, showing little variation across different models. These frequently occurring terms include names like Elias, Mara, and Elara, as well as settings such as lighthouses and professions like clockmaker and librarian. While these words are uncommon in both published works and pre-training datasets, they are frequently found in preference data likely utilized by existing models. Notably, these 'lighthouse' narratives are less common than the average post-training story, many of which involve copyrighted characters or adult themes. The findings underscore the significant effect of limited datasets paired with robust alignment algorithms on the diversity of generated content.

Key facts

20,000 stories sampled from four current models
11 words occur in 88.3% of generated stories
Recurring names: Elias, Mara, Elara
Recurring settings: lighthouses
Recurring professions: clockmaker, librarian
Tokens rare in published literature and pre-training data
Tokens found in preference data used by all models
Lighthouse stories infrequent compared to average post-training story

LLM Stories Exhibit Low Diversity, Dominated by 'Lighthouse' Tropes

Key facts

Entities

Institutions

Sources