Back to timeline

GPT-1

OpenAI releases GPT-1, demonstrating that generative pre-training on unlabeled text followed by fine-tuning can achieve strong NLP performance.

Model Release

What Happened

OpenAI published "Improving Language Understanding by Generative Pre-Training," introducing GPT (Generative Pre-trained Transformer). The model was a 117-million parameter Transformer decoder trained on the BooksCorpus dataset using a language modeling objective, then fine-tuned on downstream NLP tasks.

Why It Matters

GPT-1 established the paradigm of pre-training on large amounts of unlabeled text and then fine-tuning for specific tasks. This "pre-train then fine-tune" approach became the dominant methodology in NLP and proved that unsupervised pre-training could yield powerful general-purpose language representations. It laid the foundation for GPT-2, GPT-3, and the entire lineage of large language models.

Technical Details