GPT-2 (2019) | AI Timeline

What Happened

OpenAI announced GPT-2, a 1.5-billion parameter language model trained on 40GB of internet text. The model could generate remarkably coherent and contextually appropriate text across a wide range of domains. OpenAI initially withheld the full model due to concerns about potential misuse for generating disinformation, releasing it in stages over the following months.

Why It Matters

GPT-2 was a watershed moment for both AI capabilities and AI safety discourse. It demonstrated that scaling up language models produced qualitatively different behavior — the model could write essays, stories, and even code with minimal prompting. The staged release strategy sparked widespread debate about responsible AI publication practices and set precedents for how the field handles dual-use research.

Technical Details

Architecture: Transformer decoder, scaled up from GPT-1
Parameters: 1.5 billion (4 sizes released: 124M, 355M, 774M, 1.5B)
Training data: WebText — 40GB of text from outbound Reddit links with 3+ karma
Context window: 1,024 tokens
Key innovation: Showed that language models trained at sufficient scale become zero-shot learners, performing tasks they were never explicitly trained for
Release timeline: Small model (Feb 2019) → Medium (May) → Large (Aug) → Full (Nov)

What Happened

Why It Matters

Technical Details

Sources