metadata

library_name: transformers
license: apache-2.0
base_model: openai/whisper-small
tags:
  - audio
  - automatic-speech-recognition
  - generated_from_trainer
widget:
  - example_title: Librispeech sample 1
    src: https://cdn-media.huggingface.co/speech_samples/sample1.flac
  - example_title: Librispeech sample 2
    src: https://cdn-media.huggingface.co/speech_samples/sample2.flac
metrics:
  - wer
  - cer
model-index:
  - name: whisper-small-ru-v4
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice 21.0
          type: mozilla-foundation/common_voice_21_0
          config: ru
          split: test
          args:
            language: ru
        metrics:
          - name: Wer
            type: wer
            value: 2.065
          - name: Cer
            type: cer
            value: 0.9906
language:
  - ru
pipeline_tag: automatic-speech-recognition
datasets:
  - artyomboyko/common_voice_21_0_ru

whisper-small-ru-v4

NOTE: EXPERIMENTAL MODEL!
This is the best model obtained at the end of the fine-tuning process. Further inference testing has not yet been performed.

This model is a fine-tuned version of artyomboyko/whisper-small-ru-v3 (which in turn is a fine-tuned version of the base model openai/whisper-small) on an Common Voice 21 RU dataset. It achieves the following results on the evaluation set:

Loss: 0.0104
Wer: 2.0650
Cer: 0.9906

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

Training on 1 x MSI Suprim 4090

Training procedure

Model training time: 28h 47m

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 16
eval_batch_size: 8
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 250
training_steps: 25000
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer	Cer
0.0683	0.0387	500	0.1521	13.4494	4.4901
0.059	0.0774	1000	0.1434	12.1396	3.6132
0.0584	0.1161	1500	0.1382	11.9180	3.3839
0.0551	0.1547	2000	0.1314	11.2753	3.3867
0.0513	0.1934	2500	0.1242	10.6755	3.0711
0.0616	0.2321	3000	0.1199	10.8194	3.3670
0.0524	0.2708	3500	0.1130	10.0340	2.8311
0.0465	0.3095	4000	0.1057	10.0108	3.1744
0.0588	0.3482	4500	0.1026	10.1871	3.4398
0.0498	0.3868	5000	0.0951	8.9527	2.7278
0.0488	0.4255	5500	0.0915	9.2033	3.0227
0.0501	0.4642	6000	0.0876	8.8043	2.7854
0.0428	0.5029	6500	0.0835	8.3066	2.6446
0.0463	0.5416	7000	0.0793	7.5861	2.3860
0.0516	0.5803	7500	0.0752	7.8959	2.6551
0.0442	0.6190	8000	0.0702	7.5687	2.4814
0.0393	0.6576	8500	0.0655	7.0072	2.1594
0.0455	0.6963	9000	0.0606	6.4202	1.9970
0.0371	0.7350	9500	0.0567	6.7253	2.2651
0.041	0.7737	10000	0.0524	6.4851	2.1622
0.0368	0.8124	10500	0.0497	5.4596	1.5878
0.0397	0.8511	11000	0.0455	5.7566	2.1294
0.0342	0.8897	11500	0.0429	5.1382	1.6793
0.0322	0.9284	12000	0.0382	4.7786	1.5893
0.0316	0.9671	12500	0.0349	5.3842	2.3248
0.008	1.0058	13000	0.0315	4.2403	1.2860
0.0122	1.0445	13500	0.0303	4.7983	2.0351
0.0118	1.0832	14000	0.0285	4.9955	2.4634
0.0121	1.1219	14500	0.0285	5.0744	2.1732
0.01	1.1605	15000	0.0271	4.5906	1.8766
0.0093	1.1992	15500	0.0261	3.6103	1.3770
0.0102	1.2379	16000	0.0251	4.0651	1.5117
0.0106	1.2766	16500	0.0242	4.3899	1.8827
0.0089	1.3153	17000	0.0234	3.7252	1.3949
0.0078	1.3540	17500	0.0223	3.7217	1.6103
0.0091	1.3926	18000	0.0216	3.8284	1.6104
0.0096	1.4313	18500	0.0200	3.2519	1.5155
0.0083	1.4700	19000	0.0188	3.3168	1.3898
0.0072	1.5087	19500	0.0176	3.1231	1.4695
0.0083	1.5474	20000	0.0166	3.6625	1.6818
0.0111	1.5861	20500	0.0155	2.5152	1.1298
0.0068	1.6248	21000	0.0149	2.4142	0.9976
0.0055	1.6634	21500	0.0141	2.6451	1.3030
0.0123	1.7021	22000	0.0132	2.6289	1.2809
0.0079	1.7408	22500	0.0126	2.2576	0.9550
0.0112	1.7795	23000	0.0119	2.6149	1.3460
0.0087	1.8182	23500	0.0114	2.2878	1.1265
0.0062	1.8569	24000	0.0109	2.1903	1.0690
0.0051	1.8956	24500	0.0106	2.1277	1.0283
0.0077	1.9342	25000	0.0104	2.0650	0.9906

Framework versions

Transformers 4.49.0
Pytorch 2.6.0+cu124
Datasets 3.3.2
Tokenizers 0.21.1