What Happened
OpenAI published "Language Models are Few-Shot Learners," introducing GPT-3 — a 175-billion parameter autoregressive language model. GPT-3 demonstrated that scaling language models to sufficient size enabled strong performance on downstream tasks with only a few examples provided in the prompt (few-shot learning), without any gradient updates or fine-tuning.
Why It Matters
GPT-3 was a paradigm shift. It showed that sufficiently large language models could perform tasks from simple demonstrations in the prompt — an ability now called "in-context learning." This kicked off the scaling era in AI, with labs racing to build ever-larger models. OpenAI also made GPT-3 available via API, creating the first commercial large language model platform and spawning an entire ecosystem of AI-powered applications.
Technical Details
- Architecture: Transformer decoder with 96 layers, 175B parameters
- Training data: ~570GB of filtered text (Common Crawl, WebText2, Books1/2, Wikipedia)
- Context window: 2,048 tokens
- Training cost: Estimated $4.6M in compute
- Key findings:
- Performance scaled smoothly with model size across three orders of magnitude
- Few-shot performance approached or exceeded fine-tuned BERT-large baselines on many tasks
- Emergent abilities appeared at scale that were absent in smaller models
- Could generate code, write creative fiction, perform arithmetic, and answer factual questions