This model is a finetuned whisper-medium model with 1M audio samples from the dataset mitermix/audiosnippets