What Happened
OpenAI announced GPT-2, a 1.5-billion parameter language model trained on 40GB of internet text. The model could generate remarkably coherent and contextually appropriate text across a wide range of domains. OpenAI initially withheld the full model due to concerns about potential misuse for generating disinformation, releasing it in stages over the following months.
Why It Matters
GPT-2 was a watershed moment for both AI capabilities and AI safety discourse. It demonstrated that scaling up language models produced qualitatively different behavior — the model could write essays, stories, and even code with minimal prompting. The staged release strategy sparked widespread debate about responsible AI publication practices and set precedents for how the field handles dual-use research.
Technical Details
- Architecture: Transformer decoder, scaled up from GPT-1
- Parameters: 1.5 billion (4 sizes released: 124M, 355M, 774M, 1.5B)
- Training data: WebText — 40GB of text from outbound Reddit links with 3+ karma
- Context window: 1,024 tokens
- Key innovation: Showed that language models trained at sufficient scale become zero-shot learners, performing tasks they were never explicitly trained for
- Release timeline: Small model (Feb 2019) → Medium (May) → Large (Aug) → Full (Nov)