|
--- |
|
library_name: transformers |
|
license: apache-2.0 |
|
base_model: openai/whisper-small |
|
tags: |
|
- audio |
|
- automatic-speech-recognition |
|
- generated_from_trainer |
|
widget: |
|
- example_title: Librispeech sample 1 |
|
src: https://cdn-media.huggingface.co/speech_samples/sample1.flac |
|
- example_title: Librispeech sample 2 |
|
src: https://cdn-media.huggingface.co/speech_samples/sample2.flac |
|
metrics: |
|
- wer |
|
- cer |
|
model-index: |
|
- name: whisper-small-ru-v4 |
|
results: |
|
- task: |
|
name: Automatic Speech Recognition |
|
type: automatic-speech-recognition |
|
dataset: |
|
name: Common Voice 21.0 |
|
type: mozilla-foundation/common_voice_21_0 |
|
config: ru |
|
split: test |
|
args: |
|
language: ru |
|
metrics: |
|
- name: Wer |
|
type: wer |
|
value: 2.0650 |
|
- name: Cer |
|
type: cer |
|
value: 0.9906 |
|
language: |
|
- ru |
|
pipeline_tag: automatic-speech-recognition |
|
datasets: |
|
- artyomboyko/common_voice_21_0_ru |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# whisper-small-ru-v4 |
|
|
|
> ***NOTE: EXPERIMENTAL MODEL!*** |
|
> ***This is the best model obtained at the end of the fine-tuning process. Further inference testing has not yet been performed.*** |
|
|
|
This model is a fine-tuned version of [artyomboyko/whisper-small-ru-v3](https://huggingface.co/artyomboyko/whisper-small-ru-v3) (which in turn is a fine-tuned version |
|
of the base model [openai/whisper-small](https://huggingface.co/openai/whisper-small)) on an [Common Voice 21 RU dataset](https://commonvoice.mozilla.org/). |
|
It achieves the following results on the evaluation set: |
|
- Loss: 0.0104 |
|
- Wer: 2.0650 |
|
- Cer: 0.9906 |
|
|
|
## Model description |
|
|
|
More information needed |
|
|
|
## Intended uses & limitations |
|
|
|
More information needed |
|
|
|
## Training and evaluation data |
|
|
|
Training on 1 x [MSI Suprim 4090](https://www.msi.com/Graphics-Card/GeForce-RTX-4090-SUPRIM-24G) |
|
|
|
## Training procedure |
|
|
|
Model training time: 28h 47m |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 1e-05 |
|
- train_batch_size: 16 |
|
- eval_batch_size: 8 |
|
- seed: 42 |
|
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments |
|
- lr_scheduler_type: linear |
|
- lr_scheduler_warmup_steps: 250 |
|
- training_steps: 25000 |
|
- mixed_precision_training: Native AMP |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Wer | Cer | |
|
|:-------------:|:------:|:-----:|:---------------:|:-------:|:------:| |
|
| 0.0683 | 0.0387 | 500 | 0.1521 | 13.4494 | 4.4901 | |
|
| 0.059 | 0.0774 | 1000 | 0.1434 | 12.1396 | 3.6132 | |
|
| 0.0584 | 0.1161 | 1500 | 0.1382 | 11.9180 | 3.3839 | |
|
| 0.0551 | 0.1547 | 2000 | 0.1314 | 11.2753 | 3.3867 | |
|
| 0.0513 | 0.1934 | 2500 | 0.1242 | 10.6755 | 3.0711 | |
|
| 0.0616 | 0.2321 | 3000 | 0.1199 | 10.8194 | 3.3670 | |
|
| 0.0524 | 0.2708 | 3500 | 0.1130 | 10.0340 | 2.8311 | |
|
| 0.0465 | 0.3095 | 4000 | 0.1057 | 10.0108 | 3.1744 | |
|
| 0.0588 | 0.3482 | 4500 | 0.1026 | 10.1871 | 3.4398 | |
|
| 0.0498 | 0.3868 | 5000 | 0.0951 | 8.9527 | 2.7278 | |
|
| 0.0488 | 0.4255 | 5500 | 0.0915 | 9.2033 | 3.0227 | |
|
| 0.0501 | 0.4642 | 6000 | 0.0876 | 8.8043 | 2.7854 | |
|
| 0.0428 | 0.5029 | 6500 | 0.0835 | 8.3066 | 2.6446 | |
|
| 0.0463 | 0.5416 | 7000 | 0.0793 | 7.5861 | 2.3860 | |
|
| 0.0516 | 0.5803 | 7500 | 0.0752 | 7.8959 | 2.6551 | |
|
| 0.0442 | 0.6190 | 8000 | 0.0702 | 7.5687 | 2.4814 | |
|
| 0.0393 | 0.6576 | 8500 | 0.0655 | 7.0072 | 2.1594 | |
|
| 0.0455 | 0.6963 | 9000 | 0.0606 | 6.4202 | 1.9970 | |
|
| 0.0371 | 0.7350 | 9500 | 0.0567 | 6.7253 | 2.2651 | |
|
| 0.041 | 0.7737 | 10000 | 0.0524 | 6.4851 | 2.1622 | |
|
| 0.0368 | 0.8124 | 10500 | 0.0497 | 5.4596 | 1.5878 | |
|
| 0.0397 | 0.8511 | 11000 | 0.0455 | 5.7566 | 2.1294 | |
|
| 0.0342 | 0.8897 | 11500 | 0.0429 | 5.1382 | 1.6793 | |
|
| 0.0322 | 0.9284 | 12000 | 0.0382 | 4.7786 | 1.5893 | |
|
| 0.0316 | 0.9671 | 12500 | 0.0349 | 5.3842 | 2.3248 | |
|
| 0.008 | 1.0058 | 13000 | 0.0315 | 4.2403 | 1.2860 | |
|
| 0.0122 | 1.0445 | 13500 | 0.0303 | 4.7983 | 2.0351 | |
|
| 0.0118 | 1.0832 | 14000 | 0.0285 | 4.9955 | 2.4634 | |
|
| 0.0121 | 1.1219 | 14500 | 0.0285 | 5.0744 | 2.1732 | |
|
| 0.01 | 1.1605 | 15000 | 0.0271 | 4.5906 | 1.8766 | |
|
| 0.0093 | 1.1992 | 15500 | 0.0261 | 3.6103 | 1.3770 | |
|
| 0.0102 | 1.2379 | 16000 | 0.0251 | 4.0651 | 1.5117 | |
|
| 0.0106 | 1.2766 | 16500 | 0.0242 | 4.3899 | 1.8827 | |
|
| 0.0089 | 1.3153 | 17000 | 0.0234 | 3.7252 | 1.3949 | |
|
| 0.0078 | 1.3540 | 17500 | 0.0223 | 3.7217 | 1.6103 | |
|
| 0.0091 | 1.3926 | 18000 | 0.0216 | 3.8284 | 1.6104 | |
|
| 0.0096 | 1.4313 | 18500 | 0.0200 | 3.2519 | 1.5155 | |
|
| 0.0083 | 1.4700 | 19000 | 0.0188 | 3.3168 | 1.3898 | |
|
| 0.0072 | 1.5087 | 19500 | 0.0176 | 3.1231 | 1.4695 | |
|
| 0.0083 | 1.5474 | 20000 | 0.0166 | 3.6625 | 1.6818 | |
|
| 0.0111 | 1.5861 | 20500 | 0.0155 | 2.5152 | 1.1298 | |
|
| 0.0068 | 1.6248 | 21000 | 0.0149 | 2.4142 | 0.9976 | |
|
| 0.0055 | 1.6634 | 21500 | 0.0141 | 2.6451 | 1.3030 | |
|
| 0.0123 | 1.7021 | 22000 | 0.0132 | 2.6289 | 1.2809 | |
|
| 0.0079 | 1.7408 | 22500 | 0.0126 | 2.2576 | 0.9550 | |
|
| 0.0112 | 1.7795 | 23000 | 0.0119 | 2.6149 | 1.3460 | |
|
| 0.0087 | 1.8182 | 23500 | 0.0114 | 2.2878 | 1.1265 | |
|
| 0.0062 | 1.8569 | 24000 | 0.0109 | 2.1903 | 1.0690 | |
|
| 0.0051 | 1.8956 | 24500 | 0.0106 | 2.1277 | 1.0283 | |
|
| 0.0077 | 1.9342 | 25000 | 0.0104 | 2.0650 | 0.9906 | |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.49.0 |
|
- Pytorch 2.6.0+cu124 |
|
- Datasets 3.3.2 |
|
- Tokenizers 0.21.1 |