--- license: mit language: - ur --- # Urdu Whisper model in Pytorch from scratch implementation Trained a small Urdu whisper model [Robust Speech Recognition via Large-Scale Weak Supervision](https://cdn.openai.com/papers/whisper.pdf) ## ModelArgs Hyperparameters | Parameter | Value | Description | |-------------------------|------------------------|-----------------------------------------------------------------------------| | `batch_size` |128 | The number of samples processed before the model is updated. | | `max_lr` |1.5e-3 | Maximum learning rate. | | `dropout` | 0.1 | Dropout rate for regularization. | | `epochs` |2 | Number of training epochs. | | `block_size` | 64 | Sequence length (number of tokens or time steps). | | `tgt_vocab_size` | 200024 | Size of the target vocabulary. | | `embeddings_dims` | 512 | Dimensionality of token embeddings. | | `attn_dropout` | 0.1 | Dropout rate for attention layers. | | `no_of_heads` | 4 | Number of attention heads in multi-head attention. | | `no_of_decoder_layers` | 6 | Number of decoder layers in the model. | | `weight_decay_optim` | 0.1 | Weight decay for the optimizer. | | `log_mel_features` | 80 | Number of Mel spectrogram features. | | `kernel_size` | 3 | Kernel size for convolutional layers. | | `stride` | 2 | Stride for convolutional layers. | | `sr` | 16000 | Sampling rate of the audio. | | `device` | `'cuda:0'` | Device to run the model on (e.g., GPU). | | `SAMPLING_RATE` | 16000 | Sampling rate of the audio. | | `N_MELS` | 80 | Number of Mel bins in the spectrogram. | | `WINDOW_DURATION` | 0.025 | Duration of the analysis window in seconds (25 ms). | | `STRIDE_DURATION` | 0.010 | Stride between consecutive windows in seconds (10 ms). | | `max_t` | 500 | Maximum time steps in the spectrogram. | | `n_channels` | 80 | Number of channels in the input spectrogram. | ### Dataset [Common Voice Corpus 11.0 ](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0) Used the 'xs' snapshot. ### Frameworks: **Pytorch** ### Epochs/Steps Epochs (train) = 2 Val iterations = every epoch ### Loss Curves ![Train and Val loss curves](image/loss.png)