Neural Machine Translation with Attention (2014)

What Happened

In 2014, the “align and translate” approach introduced attention into neural machine translation, allowing the model to dynamically focus on different parts of the source sentence while generating output.

Why It Matters

Attention became a key ingredient for handling long-range dependencies and interpretability in sequence modeling, and it directly influenced later architectures that scaled to large pretrained models.

Technical Details

The model learns soft alignments between input and output tokens, typically implemented via learned scoring functions and normalized weights over encoder states.

Sources

Original paper (arXiv)
PDF