InstructGPT and RLHF (2022)

What Happened

In March 2022, OpenAI published research describing “Training language models to follow instructions with human feedback,” outlining a practical pipeline combining supervised demonstrations and reinforcement learning from human preferences (RLHF).

Why It Matters

RLHF-style alignment became a standard approach for making large language models more usable in products, influencing instruction tuning, preference optimization, and evaluation practices across the industry.

Technical Details

The pipeline typically includes supervised fine-tuning on human demonstrations, training a reward model from preference data, and optimizing the policy to maximize that reward signal.

Sources

Paper (arXiv)
OpenAI (historical context)