whisper-large-v3-persian
This model is a fine-tuned version of openai/whisper-large-v3 on the Common Voice 17.0 dataset. It achieves the following results on the evaluation set:
- Loss: 0.2499
- Wer: 26.5381
Model description
The data was fine-tuned using an RTX 6000 ADA graphics card. Over 200,000 samples were fine-tuned on the system. This data belonged to the Mozilla Foundation's Common Voice 17.0 dataset. The obtained result, despite improving the Word Error Rate (WER) compared to other models, still has grammatical weaknesses, which is due to spelling errors in the dataset.
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- training_steps: 6000
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Wer |
---|---|---|---|---|
0.1337 | 0.8110 | 2000 | 0.2818 | 31.0620 |
0.0608 | 1.6221 | 4000 | 0.2532 | 28.8171 |
0.0229 | 2.4331 | 6000 | 0.2499 | 26.5381 |
Framework versions
- Transformers 4.52.3
- Pytorch 2.7.0+cu126
- Datasets 3.6.0
- Tokenizers 0.21.1
- Downloads last month
- 39
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for nezamisafa/whisper-large-v3-persian
Base model
openai/whisper-large-v3