What Happened
OpenAI published "Improving Language Understanding by Generative Pre-Training," introducing GPT (Generative Pre-trained Transformer). The model was a 117-million parameter Transformer decoder trained on the BooksCorpus dataset using a language modeling objective, then fine-tuned on downstream NLP tasks.
Why It Matters
GPT-1 established the paradigm of pre-training on large amounts of unlabeled text and then fine-tuning for specific tasks. This "pre-train then fine-tune" approach became the dominant methodology in NLP and proved that unsupervised pre-training could yield powerful general-purpose language representations. It laid the foundation for GPT-2, GPT-3, and the entire lineage of large language models.
Technical Details
- Architecture: 12-layer Transformer decoder with 117M parameters
- Training data: BooksCorpus (~7,000 unpublished books)
- Training objective: Next-token prediction (autoregressive language modeling)
- Innovation: Showed that a single pre-trained model could be fine-tuned to achieve competitive results across 12 different NLP benchmarks, including natural language inference, question answering, and text classification
- Results: Achieved state-of-the-art on 9 of 12 tasks evaluated