--- library_name: transformers license: apache-2.0 base_model: openai/whisper-small tags: - audio - automatic-speech-recognition - generated_from_trainer widget: - example_title: Librispeech sample 1 src: https://cdn-media.huggingface.co/speech_samples/sample1.flac - example_title: Librispeech sample 2 src: https://cdn-media.huggingface.co/speech_samples/sample2.flac metrics: - wer - cer model-index: - name: whisper-small-ru-v4 results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice 21.0 type: mozilla-foundation/common_voice_21_0 config: ru split: test args: language: ru metrics: - name: Wer type: wer value: 2.0650 - name: Cer type: cer value: 0.9906 language: - ru pipeline_tag: automatic-speech-recognition datasets: - artyomboyko/common_voice_21_0_ru --- # whisper-small-ru-v4 > ***NOTE: EXPERIMENTAL MODEL!*** > ***This is the best model obtained at the end of the fine-tuning process. Further inference testing has not yet been performed.*** This model is a fine-tuned version of [artyomboyko/whisper-small-ru-v3](https://huggingface.co/artyomboyko/whisper-small-ru-v3) (which in turn is a fine-tuned version of the base model [openai/whisper-small](https://huggingface.co/openai/whisper-small)) on an [Common Voice 21 RU dataset](https://commonvoice.mozilla.org/). It achieves the following results on the evaluation set: - Loss: 0.0104 - Wer: 2.0650 - Cer: 0.9906 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data Training on 1 x [MSI Suprim 4090](https://www.msi.com/Graphics-Card/GeForce-RTX-4090-SUPRIM-24G) ## Training procedure Model training time: 28h 47m ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-05 - train_batch_size: 16 - eval_batch_size: 8 - seed: 42 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 250 - training_steps: 25000 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | Wer | Cer | |:-------------:|:------:|:-----:|:---------------:|:-------:|:------:| | 0.0683 | 0.0387 | 500 | 0.1521 | 13.4494 | 4.4901 | | 0.059 | 0.0774 | 1000 | 0.1434 | 12.1396 | 3.6132 | | 0.0584 | 0.1161 | 1500 | 0.1382 | 11.9180 | 3.3839 | | 0.0551 | 0.1547 | 2000 | 0.1314 | 11.2753 | 3.3867 | | 0.0513 | 0.1934 | 2500 | 0.1242 | 10.6755 | 3.0711 | | 0.0616 | 0.2321 | 3000 | 0.1199 | 10.8194 | 3.3670 | | 0.0524 | 0.2708 | 3500 | 0.1130 | 10.0340 | 2.8311 | | 0.0465 | 0.3095 | 4000 | 0.1057 | 10.0108 | 3.1744 | | 0.0588 | 0.3482 | 4500 | 0.1026 | 10.1871 | 3.4398 | | 0.0498 | 0.3868 | 5000 | 0.0951 | 8.9527 | 2.7278 | | 0.0488 | 0.4255 | 5500 | 0.0915 | 9.2033 | 3.0227 | | 0.0501 | 0.4642 | 6000 | 0.0876 | 8.8043 | 2.7854 | | 0.0428 | 0.5029 | 6500 | 0.0835 | 8.3066 | 2.6446 | | 0.0463 | 0.5416 | 7000 | 0.0793 | 7.5861 | 2.3860 | | 0.0516 | 0.5803 | 7500 | 0.0752 | 7.8959 | 2.6551 | | 0.0442 | 0.6190 | 8000 | 0.0702 | 7.5687 | 2.4814 | | 0.0393 | 0.6576 | 8500 | 0.0655 | 7.0072 | 2.1594 | | 0.0455 | 0.6963 | 9000 | 0.0606 | 6.4202 | 1.9970 | | 0.0371 | 0.7350 | 9500 | 0.0567 | 6.7253 | 2.2651 | | 0.041 | 0.7737 | 10000 | 0.0524 | 6.4851 | 2.1622 | | 0.0368 | 0.8124 | 10500 | 0.0497 | 5.4596 | 1.5878 | | 0.0397 | 0.8511 | 11000 | 0.0455 | 5.7566 | 2.1294 | | 0.0342 | 0.8897 | 11500 | 0.0429 | 5.1382 | 1.6793 | | 0.0322 | 0.9284 | 12000 | 0.0382 | 4.7786 | 1.5893 | | 0.0316 | 0.9671 | 12500 | 0.0349 | 5.3842 | 2.3248 | | 0.008 | 1.0058 | 13000 | 0.0315 | 4.2403 | 1.2860 | | 0.0122 | 1.0445 | 13500 | 0.0303 | 4.7983 | 2.0351 | | 0.0118 | 1.0832 | 14000 | 0.0285 | 4.9955 | 2.4634 | | 0.0121 | 1.1219 | 14500 | 0.0285 | 5.0744 | 2.1732 | | 0.01 | 1.1605 | 15000 | 0.0271 | 4.5906 | 1.8766 | | 0.0093 | 1.1992 | 15500 | 0.0261 | 3.6103 | 1.3770 | | 0.0102 | 1.2379 | 16000 | 0.0251 | 4.0651 | 1.5117 | | 0.0106 | 1.2766 | 16500 | 0.0242 | 4.3899 | 1.8827 | | 0.0089 | 1.3153 | 17000 | 0.0234 | 3.7252 | 1.3949 | | 0.0078 | 1.3540 | 17500 | 0.0223 | 3.7217 | 1.6103 | | 0.0091 | 1.3926 | 18000 | 0.0216 | 3.8284 | 1.6104 | | 0.0096 | 1.4313 | 18500 | 0.0200 | 3.2519 | 1.5155 | | 0.0083 | 1.4700 | 19000 | 0.0188 | 3.3168 | 1.3898 | | 0.0072 | 1.5087 | 19500 | 0.0176 | 3.1231 | 1.4695 | | 0.0083 | 1.5474 | 20000 | 0.0166 | 3.6625 | 1.6818 | | 0.0111 | 1.5861 | 20500 | 0.0155 | 2.5152 | 1.1298 | | 0.0068 | 1.6248 | 21000 | 0.0149 | 2.4142 | 0.9976 | | 0.0055 | 1.6634 | 21500 | 0.0141 | 2.6451 | 1.3030 | | 0.0123 | 1.7021 | 22000 | 0.0132 | 2.6289 | 1.2809 | | 0.0079 | 1.7408 | 22500 | 0.0126 | 2.2576 | 0.9550 | | 0.0112 | 1.7795 | 23000 | 0.0119 | 2.6149 | 1.3460 | | 0.0087 | 1.8182 | 23500 | 0.0114 | 2.2878 | 1.1265 | | 0.0062 | 1.8569 | 24000 | 0.0109 | 2.1903 | 1.0690 | | 0.0051 | 1.8956 | 24500 | 0.0106 | 2.1277 | 1.0283 | | 0.0077 | 1.9342 | 25000 | 0.0104 | 2.0650 | 0.9906 | ### Framework versions - Transformers 4.49.0 - Pytorch 2.6.0+cu124 - Datasets 3.3.2 - Tokenizers 0.21.1