Talkie: A 13B Language Model Trained on Pre-1931 Texts Released
Researchers Nick Levine, David Duvenaud, and Alec Radford (of GPT fame) have released talkie, a 13-billion-parameter language model trained exclusively on pre-1931 English texts. The base model, talkie-1930-13b-base (53.1 GB), was trained on 260 billion tokens of historical text. A fine-tuned version, talkie-1930-13b-it (26.6 GB), uses instruction-response pairs extracted from pre-1931 reference works and synthetic data generated by modern LLMs (Claude Sonnet 4.6 and Claude Opus 4.6) to improve conversational abilities. Both models are Apache 2.0 licensed. The project aims to create a 'vegan' LLM trained on out-of-copyright data, though the chat model relies on non-vegan models for fine-tuning. The team hopes future versions will use vintage base models as judges for fully era-appropriate post-training. A demo is available online. The project was announced by Simon Willison on April 28, 2026.
Key facts
- Talkie is a 13B parameter language model trained on pre-1931 English text.
- Base model trained on 260B tokens of historical text.
- Fine-tuned version uses instruction-response pairs from pre-1931 reference works.
- Synthetic data from Claude Sonnet 4.6 and Claude Opus 4.6 used for fine-tuning.
- Both models are Apache 2.0 licensed.
- Training data is entirely out of copyright (US cutoff January 1, 1931).
- Project aims for 'vegan' LLM trained on licensed or out-of-copyright data.
- Demo available; project announced by Simon Willison on April 28, 2026.
Entities
Artists
- Nick Levine
- David Duvenaud
- Alec Radford
- Simon Willison