GPT-3 (2020) | AI Timeline

What Happened

OpenAI published "Language Models are Few-Shot Learners," introducing GPT-3 — a 175-billion parameter autoregressive language model. GPT-3 demonstrated that scaling language models to sufficient size enabled strong performance on downstream tasks with only a few examples provided in the prompt (few-shot learning), without any gradient updates or fine-tuning.

Why It Matters

GPT-3 was a paradigm shift. It showed that sufficiently large language models could perform tasks from simple demonstrations in the prompt — an ability now called "in-context learning." This kicked off the scaling era in AI, with labs racing to build ever-larger models. OpenAI also made GPT-3 available via API, creating the first commercial large language model platform and spawning an entire ecosystem of AI-powered applications.

Technical Details

Architecture: Transformer decoder with 96 layers, 175B parameters
Training data: ~570GB of filtered text (Common Crawl, WebText2, Books1/2, Wikipedia)
Context window: 2,048 tokens
Training cost: Estimated $4.6M in compute
Key findings:
Performance scaled smoothly with model size across three orders of magnitude
Few-shot performance approached or exceeded fine-tuned BERT-large baselines on many tasks
Emergent abilities appeared at scale that were absent in smaller models
Could generate code, write creative fiction, perform arithmetic, and answer factual questions

What Happened

Why It Matters

Technical Details

Sources