Build Your Own GPT from Scratch in a Single Workshop
A new hands-on workshop guides participants through building a working GPT model from scratch, writing every component of the training pipeline themselves. Based on Andrej Karpathy's nanoGPT, the project strips down the architecture to a ~10 million parameter model that trains on a laptop in under an hour. Participants build a character-level tokenizer, transformer model (embeddings, attention, feed-forward layers), training loop (forward pass, loss, backprop, optimizer, learning rate scheduling), and text generator capable of producing Shakespeare-like text. No black-box libraries are used; everything is written in Python with PyTorch. The workshop is designed for anyone comfortable reading Python code, with no prior machine learning experience required. Training automatically uses Apple Silicon GPU (MPS), NVIDIA GPU (CUDA), or CPU, and also runs on Google Colab. The project is inspired by Karpathy's nanoGPT, which reproduces GPT-2 (124M parameters) in ~300 lines of PyTorch. Additional references include Karpathy's microgpt (200 lines of pure Python), nanochat (ChatGPT clone), the original 'Attention Is All You Need' paper (2017), the GPT-2 paper (2019), and the TinyStories paper on small models with curated data.
Key facts
- Workshop builds a ~10M parameter GPT model from scratch
- Trains on a laptop in under an hour
- Uses character-level tokenization on Shakespeare (vocab_size=65, block_size=256)
- No black-box libraries; all code written in Python with PyTorch
- Based on Andrej Karpathy's nanoGPT project
- Runs on Apple Silicon GPU (MPS), NVIDIA GPU (CUDA), or CPU
- Also works on Google Colab
- References include 'Attention Is All You Need' (2017) and GPT-2 paper (2019)
Entities
Artists
- Andrej Karpathy