Update README.md

f7ca886 verified about 1 month ago

6.33 kB

	---
	library_name: transformers
	license: apache-2.0
	base_model: openai/whisper-small
	tags:
	- audio
	- automatic-speech-recognition
	- generated_from_trainer
	widget:
	- example_title: Librispeech sample 1
	src: https://cdn-media.huggingface.co/speech_samples/sample1.flac
	- example_title: Librispeech sample 2
	src: https://cdn-media.huggingface.co/speech_samples/sample2.flac
	metrics:
	- wer
	- cer
	model-index:
	- name: whisper-small-ru-v4
	results:
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: Common Voice 21.0
	type: mozilla-foundation/common_voice_21_0
	config: ru
	split: test
	args:
	language: ru
	metrics:
	- name: Wer
	type: wer
	value: 2.0650
	- name: Cer
	type: cer
	value: 0.9906
	language:
	- ru
	pipeline_tag: automatic-speech-recognition
	datasets:
	- artyomboyko/common_voice_21_0_ru
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# whisper-small-ru-v4

	> *NOTE: EXPERIMENTAL MODEL!*
	> *This is the best model obtained at the end of the fine-tuning process. Further inference testing has not yet been performed.*

	This model is a fine-tuned version of [artyomboyko/whisper-small-ru-v3](https://huggingface.co/artyomboyko/whisper-small-ru-v3) (which in turn is a fine-tuned version
	of the base model [openai/whisper-small](https://huggingface.co/openai/whisper-small)) on an [Common Voice 21 RU dataset](https://commonvoice.mozilla.org/).
	It achieves the following results on the evaluation set:
	- Loss: 0.0104
	- Wer: 2.0650
	- Cer: 0.9906

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	Training on 1 x [MSI Suprim 4090](https://www.msi.com/Graphics-Card/GeForce-RTX-4090-SUPRIM-24G)

	## Training procedure

	Model training time: 28h 47m

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 16
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 250
	- training_steps: 25000
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Wer \| Cer \|
	\|:-------------:\|:------:\|:-----:\|:---------------:\|:-------:\|:------:\|
	\| 0.0683 \| 0.0387 \| 500 \| 0.1521 \| 13.4494 \| 4.4901 \|
	\| 0.059 \| 0.0774 \| 1000 \| 0.1434 \| 12.1396 \| 3.6132 \|
	\| 0.0584 \| 0.1161 \| 1500 \| 0.1382 \| 11.9180 \| 3.3839 \|
	\| 0.0551 \| 0.1547 \| 2000 \| 0.1314 \| 11.2753 \| 3.3867 \|
	\| 0.0513 \| 0.1934 \| 2500 \| 0.1242 \| 10.6755 \| 3.0711 \|
	\| 0.0616 \| 0.2321 \| 3000 \| 0.1199 \| 10.8194 \| 3.3670 \|
	\| 0.0524 \| 0.2708 \| 3500 \| 0.1130 \| 10.0340 \| 2.8311 \|
	\| 0.0465 \| 0.3095 \| 4000 \| 0.1057 \| 10.0108 \| 3.1744 \|
	\| 0.0588 \| 0.3482 \| 4500 \| 0.1026 \| 10.1871 \| 3.4398 \|
	\| 0.0498 \| 0.3868 \| 5000 \| 0.0951 \| 8.9527 \| 2.7278 \|
	\| 0.0488 \| 0.4255 \| 5500 \| 0.0915 \| 9.2033 \| 3.0227 \|
	\| 0.0501 \| 0.4642 \| 6000 \| 0.0876 \| 8.8043 \| 2.7854 \|
	\| 0.0428 \| 0.5029 \| 6500 \| 0.0835 \| 8.3066 \| 2.6446 \|
	\| 0.0463 \| 0.5416 \| 7000 \| 0.0793 \| 7.5861 \| 2.3860 \|
	\| 0.0516 \| 0.5803 \| 7500 \| 0.0752 \| 7.8959 \| 2.6551 \|
	\| 0.0442 \| 0.6190 \| 8000 \| 0.0702 \| 7.5687 \| 2.4814 \|
	\| 0.0393 \| 0.6576 \| 8500 \| 0.0655 \| 7.0072 \| 2.1594 \|
	\| 0.0455 \| 0.6963 \| 9000 \| 0.0606 \| 6.4202 \| 1.9970 \|
	\| 0.0371 \| 0.7350 \| 9500 \| 0.0567 \| 6.7253 \| 2.2651 \|
	\| 0.041 \| 0.7737 \| 10000 \| 0.0524 \| 6.4851 \| 2.1622 \|
	\| 0.0368 \| 0.8124 \| 10500 \| 0.0497 \| 5.4596 \| 1.5878 \|
	\| 0.0397 \| 0.8511 \| 11000 \| 0.0455 \| 5.7566 \| 2.1294 \|
	\| 0.0342 \| 0.8897 \| 11500 \| 0.0429 \| 5.1382 \| 1.6793 \|
	\| 0.0322 \| 0.9284 \| 12000 \| 0.0382 \| 4.7786 \| 1.5893 \|
	\| 0.0316 \| 0.9671 \| 12500 \| 0.0349 \| 5.3842 \| 2.3248 \|
	\| 0.008 \| 1.0058 \| 13000 \| 0.0315 \| 4.2403 \| 1.2860 \|
	\| 0.0122 \| 1.0445 \| 13500 \| 0.0303 \| 4.7983 \| 2.0351 \|
	\| 0.0118 \| 1.0832 \| 14000 \| 0.0285 \| 4.9955 \| 2.4634 \|
	\| 0.0121 \| 1.1219 \| 14500 \| 0.0285 \| 5.0744 \| 2.1732 \|
	\| 0.01 \| 1.1605 \| 15000 \| 0.0271 \| 4.5906 \| 1.8766 \|
	\| 0.0093 \| 1.1992 \| 15500 \| 0.0261 \| 3.6103 \| 1.3770 \|
	\| 0.0102 \| 1.2379 \| 16000 \| 0.0251 \| 4.0651 \| 1.5117 \|
	\| 0.0106 \| 1.2766 \| 16500 \| 0.0242 \| 4.3899 \| 1.8827 \|
	\| 0.0089 \| 1.3153 \| 17000 \| 0.0234 \| 3.7252 \| 1.3949 \|
	\| 0.0078 \| 1.3540 \| 17500 \| 0.0223 \| 3.7217 \| 1.6103 \|
	\| 0.0091 \| 1.3926 \| 18000 \| 0.0216 \| 3.8284 \| 1.6104 \|
	\| 0.0096 \| 1.4313 \| 18500 \| 0.0200 \| 3.2519 \| 1.5155 \|
	\| 0.0083 \| 1.4700 \| 19000 \| 0.0188 \| 3.3168 \| 1.3898 \|
	\| 0.0072 \| 1.5087 \| 19500 \| 0.0176 \| 3.1231 \| 1.4695 \|
	\| 0.0083 \| 1.5474 \| 20000 \| 0.0166 \| 3.6625 \| 1.6818 \|
	\| 0.0111 \| 1.5861 \| 20500 \| 0.0155 \| 2.5152 \| 1.1298 \|
	\| 0.0068 \| 1.6248 \| 21000 \| 0.0149 \| 2.4142 \| 0.9976 \|
	\| 0.0055 \| 1.6634 \| 21500 \| 0.0141 \| 2.6451 \| 1.3030 \|
	\| 0.0123 \| 1.7021 \| 22000 \| 0.0132 \| 2.6289 \| 1.2809 \|
	\| 0.0079 \| 1.7408 \| 22500 \| 0.0126 \| 2.2576 \| 0.9550 \|
	\| 0.0112 \| 1.7795 \| 23000 \| 0.0119 \| 2.6149 \| 1.3460 \|
	\| 0.0087 \| 1.8182 \| 23500 \| 0.0114 \| 2.2878 \| 1.1265 \|
	\| 0.0062 \| 1.8569 \| 24000 \| 0.0109 \| 2.1903 \| 1.0690 \|
	\| 0.0051 \| 1.8956 \| 24500 \| 0.0106 \| 2.1277 \| 1.0283 \|
	\| 0.0077 \| 1.9342 \| 25000 \| 0.0104 \| 2.0650 \| 0.9906 \|


	### Framework versions

	- Transformers 4.49.0
	- Pytorch 2.6.0+cu124
	- Datasets 3.3.2
	- Tokenizers 0.21.1