GPT-1 (2018) | AI Timeline

What Happened

OpenAI published "Improving Language Understanding by Generative Pre-Training," introducing GPT (Generative Pre-trained Transformer). The model was a 117-million parameter Transformer decoder trained on the BooksCorpus dataset using a language modeling objective, then fine-tuned on downstream NLP tasks.

Why It Matters

GPT-1 established the paradigm of pre-training on large amounts of unlabeled text and then fine-tuning for specific tasks. This "pre-train then fine-tune" approach became the dominant methodology in NLP and proved that unsupervised pre-training could yield powerful general-purpose language representations. It laid the foundation for GPT-2, GPT-3, and the entire lineage of large language models.

Technical Details

Architecture: 12-layer Transformer decoder with 117M parameters
Training data: BooksCorpus (~7,000 unpublished books)
Training objective: Next-token prediction (autoregressive language modeling)
Innovation: Showed that a single pre-trained model could be fine-tuned to achieve competitive results across 12 different NLP benchmarks, including natural language inference, question answering, and text classification
Results: Achieved state-of-the-art on 9 of 12 tasks evaluated

What Happened

Why It Matters

Technical Details

Sources