OpenAI Traces Mysterious Goblin Metaphors to Nerdy Personality Training
OpenAI has traced a strange pattern in its GPT models—an increasing tendency to mention goblins, gremlins, and other creatures in responses—to reinforcement learning rewards for a specific personality feature. Starting with GPT-5.1 in November 2025, users reported odd overfamiliarity and verbal tics. Analysis revealed that mentions of 'goblin' in ChatGPT had risen by 175% and 'gremlin' by 52% after the launch. The behavior escalated with GPT-5.4, prompting an internal investigation that found the source: the 'Nerdy' personality, which accounted for only 2.5% of all ChatGPT responses but 66.7% of all 'goblin' mentions. The Nerdy personality's system prompt encouraged playful, quirky language, and the reward signal for that style inadvertently favored outputs containing creature words. This reward bias was present in 76.2% of training datasets. The behavior then transferred to non-Nerdy contexts through reinforcement learning and supervised fine-tuning, creating a feedback loop. OpenAI retired the Nerdy personality in March 2026, removed the goblin-affine reward signal, and filtered training data. However, GPT-5.5, which began training before the root cause was found, still exhibited the tic; a developer-prompt instruction was added to mitigate it in Codex. The investigation led to new tools for auditing model behavior and fixing root causes.
Key facts
- GPT-5.1 launched in November 2025.
- Goblin mentions in ChatGPT rose 175% after GPT-5.1 launch.
- Gremlin mentions rose 52% after GPT-5.1 launch.
- Nerdy personality accounted for 2.5% of ChatGPT responses but 66.7% of goblin mentions.
- Nerdy personality reward signal favored creature words in 76.2% of datasets.
- Behavior transferred to non-Nerdy contexts via RL and SFT.
- Nerdy personality retired in March 2026.
- GPT-5.5 trained before root cause was found; mitigation added for Codex.
Entities
Institutions
- OpenAI
- ChatGPT
- Codex