What Happened
In 2014, the “align and translate” approach introduced attention into neural machine translation, allowing the model to dynamically focus on different parts of the source sentence while generating output.
Why It Matters
Attention became a key ingredient for handling long-range dependencies and interpretability in sequence modeling, and it directly influenced later architectures that scaled to large pretrained models.
Technical Details
The model learns soft alignments between input and output tokens, typically implemented via learned scoring functions and normalized weights over encoder states.