---
library_name: transformers
license: apache-2.0
base_model: openai/whisper-small
tags:
- audio
- automatic-speech-recognition
- generated_from_trainer
widget:
- example_title: Librispeech sample 1
  src: https://cdn-media.huggingface.co/speech_samples/sample1.flac
- example_title: Librispeech sample 2
  src: https://cdn-media.huggingface.co/speech_samples/sample2.flac
metrics:
- wer
- cer
model-index:
- name: whisper-small-ru-v4
  results:
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: Common Voice 21.0
      type: mozilla-foundation/common_voice_21_0
      config: ru
      split: test
      args:
        language: ru
    metrics:
    - name: Wer
      type: wer
      value: 2.0650
    - name: Cer
      type: cer
      value: 0.9906
language:
- ru
pipeline_tag: automatic-speech-recognition
datasets:
- artyomboyko/common_voice_21_0_ru
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# whisper-small-ru-v4

> ***NOTE: EXPERIMENTAL MODEL!***    
> ***This is the best model obtained at the end of the fine-tuning process. Further inference testing has not yet been performed.***    

This model is a fine-tuned version of [artyomboyko/whisper-small-ru-v3](https://huggingface.co/artyomboyko/whisper-small-ru-v3) (which in turn is a fine-tuned version 
of the base model [openai/whisper-small](https://huggingface.co/openai/whisper-small)) on an [Common Voice 21 RU dataset](https://commonvoice.mozilla.org/).
It achieves the following results on the evaluation set:
- Loss: 0.0104
- Wer: 2.0650
- Cer: 0.9906

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

Training on 1 x [MSI Suprim 4090](https://www.msi.com/Graphics-Card/GeForce-RTX-4090-SUPRIM-24G)    

## Training procedure

Model training time: 28h 47m

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 250
- training_steps: 25000
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch  | Step  | Validation Loss | Wer     | Cer    |
|:-------------:|:------:|:-----:|:---------------:|:-------:|:------:|
| 0.0683        | 0.0387 | 500   | 0.1521          | 13.4494 | 4.4901 |
| 0.059         | 0.0774 | 1000  | 0.1434          | 12.1396 | 3.6132 |
| 0.0584        | 0.1161 | 1500  | 0.1382          | 11.9180 | 3.3839 |
| 0.0551        | 0.1547 | 2000  | 0.1314          | 11.2753 | 3.3867 |
| 0.0513        | 0.1934 | 2500  | 0.1242          | 10.6755 | 3.0711 |
| 0.0616        | 0.2321 | 3000  | 0.1199          | 10.8194 | 3.3670 |
| 0.0524        | 0.2708 | 3500  | 0.1130          | 10.0340 | 2.8311 |
| 0.0465        | 0.3095 | 4000  | 0.1057          | 10.0108 | 3.1744 |
| 0.0588        | 0.3482 | 4500  | 0.1026          | 10.1871 | 3.4398 |
| 0.0498        | 0.3868 | 5000  | 0.0951          | 8.9527  | 2.7278 |
| 0.0488        | 0.4255 | 5500  | 0.0915          | 9.2033  | 3.0227 |
| 0.0501        | 0.4642 | 6000  | 0.0876          | 8.8043  | 2.7854 |
| 0.0428        | 0.5029 | 6500  | 0.0835          | 8.3066  | 2.6446 |
| 0.0463        | 0.5416 | 7000  | 0.0793          | 7.5861  | 2.3860 |
| 0.0516        | 0.5803 | 7500  | 0.0752          | 7.8959  | 2.6551 |
| 0.0442        | 0.6190 | 8000  | 0.0702          | 7.5687  | 2.4814 |
| 0.0393        | 0.6576 | 8500  | 0.0655          | 7.0072  | 2.1594 |
| 0.0455        | 0.6963 | 9000  | 0.0606          | 6.4202  | 1.9970 |
| 0.0371        | 0.7350 | 9500  | 0.0567          | 6.7253  | 2.2651 |
| 0.041         | 0.7737 | 10000 | 0.0524          | 6.4851  | 2.1622 |
| 0.0368        | 0.8124 | 10500 | 0.0497          | 5.4596  | 1.5878 |
| 0.0397        | 0.8511 | 11000 | 0.0455          | 5.7566  | 2.1294 |
| 0.0342        | 0.8897 | 11500 | 0.0429          | 5.1382  | 1.6793 |
| 0.0322        | 0.9284 | 12000 | 0.0382          | 4.7786  | 1.5893 |
| 0.0316        | 0.9671 | 12500 | 0.0349          | 5.3842  | 2.3248 |
| 0.008         | 1.0058 | 13000 | 0.0315          | 4.2403  | 1.2860 |
| 0.0122        | 1.0445 | 13500 | 0.0303          | 4.7983  | 2.0351 |
| 0.0118        | 1.0832 | 14000 | 0.0285          | 4.9955  | 2.4634 |
| 0.0121        | 1.1219 | 14500 | 0.0285          | 5.0744  | 2.1732 |
| 0.01          | 1.1605 | 15000 | 0.0271          | 4.5906  | 1.8766 |
| 0.0093        | 1.1992 | 15500 | 0.0261          | 3.6103  | 1.3770 |
| 0.0102        | 1.2379 | 16000 | 0.0251          | 4.0651  | 1.5117 |
| 0.0106        | 1.2766 | 16500 | 0.0242          | 4.3899  | 1.8827 |
| 0.0089        | 1.3153 | 17000 | 0.0234          | 3.7252  | 1.3949 |
| 0.0078        | 1.3540 | 17500 | 0.0223          | 3.7217  | 1.6103 |
| 0.0091        | 1.3926 | 18000 | 0.0216          | 3.8284  | 1.6104 |
| 0.0096        | 1.4313 | 18500 | 0.0200          | 3.2519  | 1.5155 |
| 0.0083        | 1.4700 | 19000 | 0.0188          | 3.3168  | 1.3898 |
| 0.0072        | 1.5087 | 19500 | 0.0176          | 3.1231  | 1.4695 |
| 0.0083        | 1.5474 | 20000 | 0.0166          | 3.6625  | 1.6818 |
| 0.0111        | 1.5861 | 20500 | 0.0155          | 2.5152  | 1.1298 |
| 0.0068        | 1.6248 | 21000 | 0.0149          | 2.4142  | 0.9976 |
| 0.0055        | 1.6634 | 21500 | 0.0141          | 2.6451  | 1.3030 |
| 0.0123        | 1.7021 | 22000 | 0.0132          | 2.6289  | 1.2809 |
| 0.0079        | 1.7408 | 22500 | 0.0126          | 2.2576  | 0.9550 |
| 0.0112        | 1.7795 | 23000 | 0.0119          | 2.6149  | 1.3460 |
| 0.0087        | 1.8182 | 23500 | 0.0114          | 2.2878  | 1.1265 |
| 0.0062        | 1.8569 | 24000 | 0.0109          | 2.1903  | 1.0690 |
| 0.0051        | 1.8956 | 24500 | 0.0106          | 2.1277  | 1.0283 |
| 0.0077        | 1.9342 | 25000 | 0.0104          | 2.0650  | 0.9906 |


### Framework versions

- Transformers 4.49.0
- Pytorch 2.6.0+cu124
- Datasets 3.3.2
- Tokenizers 0.21.1