mahwizzzz's picture
Create README.md
c5dd9a7 verified
---
license: mit
language:
- ur
---
# Urdu Whisper model in Pytorch from scratch implementation
Trained a small Urdu whisper model
[Robust Speech Recognition via Large-Scale Weak Supervision](https://cdn.openai.com/papers/whisper.pdf)
## ModelArgs Hyperparameters
| Parameter | Value | Description |
|-------------------------|------------------------|-----------------------------------------------------------------------------|
| `batch_size` |128 | The number of samples processed before the model is updated. |
| `max_lr` |1.5e-3 | Maximum learning rate. |
| `dropout` | 0.1 | Dropout rate for regularization. |
| `epochs` |2 | Number of training epochs. |
| `block_size` | 64 | Sequence length (number of tokens or time steps). |
| `tgt_vocab_size` | 200024 | Size of the target vocabulary. |
| `embeddings_dims` | 512 | Dimensionality of token embeddings. |
| `attn_dropout` | 0.1 | Dropout rate for attention layers. |
| `no_of_heads` | 4 | Number of attention heads in multi-head attention. |
| `no_of_decoder_layers` | 6 | Number of decoder layers in the model. |
| `weight_decay_optim` | 0.1 | Weight decay for the optimizer. |
| `log_mel_features` | 80 | Number of Mel spectrogram features. |
| `kernel_size` | 3 | Kernel size for convolutional layers. |
| `stride` | 2 | Stride for convolutional layers. |
| `sr` | 16000 | Sampling rate of the audio. |
| `device` | `'cuda:0'` | Device to run the model on (e.g., GPU). |
| `SAMPLING_RATE` | 16000 | Sampling rate of the audio. |
| `N_MELS` | 80 | Number of Mel bins in the spectrogram. |
| `WINDOW_DURATION` | 0.025 | Duration of the analysis window in seconds (25 ms). |
| `STRIDE_DURATION` | 0.010 | Stride between consecutive windows in seconds (10 ms). |
| `max_t` | 500 | Maximum time steps in the spectrogram. |
| `n_channels` | 80 | Number of channels in the input spectrogram. |
### Dataset
[Common Voice Corpus 11.0 ](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0)
Used the 'xs' snapshot.
### Frameworks:
**Pytorch**
### Epochs/Steps
Epochs (train) = 2
Val iterations = every epoch
### Loss Curves
![Train and Val loss curves](image/loss.png)