What Happened
In September 2022, OpenAI published Whisper and open-sourced models and inference code, describing a transformer-based ASR system trained on large-scale web data.
Why It Matters
Whisper became a widely used building block for voice interfaces, transcription tools, and local speech pipelines—especially valuable for developers seeking strong baseline ASR without paid APIs.
Technical Details
Whisper is implemented as an encoder–decoder Transformer operating on audio features (e.g., log-Mel spectrograms), supporting transcription and translation tasks within a unified model.