Notes on OpenAI's Whisper

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification. Code repo | Model card | Paper

Specs

Size Parameters English-only model Multilingual model Model file size VRAM required
tiny 39 M ~ 76 MB ~ 1 GB
base 74 M ~ 145 MB ~ 1 GB
small 244 M ~ 484 MB ~ 2 GB
medium 769 M ~ 1.5 GB ~ 5 GB
large 1550 M ˣ ~ 3.1 GB ~ 10 GB
Topics: ML