Whisper Edu v0.2

Finetuned Whisper-base on 70 000 lecture samples to adapt to educational recordings (biology, history, art, etc.)

…


Final evalβ€―WER = 12.76β€―% (70β€―k train chunks, 2β€―k val, 4β€―k steps)

0%| | 0/4000 [00:00<?, ?it/s]Passing a tuple of past_key_values is deprecated and will be removed in Transformers v4.43.0. You should pass an instance of EncoderDecoderCache instead, e.g. past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values). {'loss': 0.3755, 'grad_norm': 2.26393461227417, 'learning_rate': 1.98e-06, 'epoch': 0.04} {'loss': 0.2788, 'grad_norm': 2.8100273609161377, 'learning_rate': 3.980000000000001e-06, 'epoch': 0.09} {'loss': 0.2569, 'grad_norm': 2.3182971477508545, 'learning_rate': 5.98e-06, 'epoch': 0.13} {'loss': 0.2274, 'grad_norm': 2.190901756286621, 'learning_rate': 7.980000000000002e-06, 'epoch': 0.18} {'loss': 0.2323, 'grad_norm': 2.9019129276275635, 'learning_rate': 9.980000000000001e-06, 'epoch': 0.22} 12%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 500/4000 [41:14<4:29:09, 4.61s/itD ue to a bug fix in https://github.com/huggingface/transformers/pull/28687 transcription using a multilingual Whisper will default to language detection followed by transcription instead of translation to English.This might be a breaking change for your use case. If you want to instead always translate your audio to English, make sure to pass language='en'. The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results. {'eval_loss': 0.22917959094047546, 'eval_wer': 0.14010581502188965, 'eval_runtime': 4088.2475, 'eval_samples_per_second': 0.539, 'eval_steps_per_second': 0.034, 'epoch': 0.22} 12%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 500/4000 [1:49:22<4:29:09, 4.61s/it]C:\Users\Rodo\Desktop\Projects\Dataset-1\venv\Lib\site-packages\transformers\modeling_utils.py:2810: UserWarning: Moving the following attributes in the config to the generation config: {'max_length': 448, 'suppress_tokens': [1, 2, 7, 8, 9, 10, 14, 25, 26, 27, 28, 29, 31, 58, 59, 60, 61, 62, 63, 90, 91, 92, 93, 359, 503, 522, 542, 873, 893, 902, 918, 922, 931, 1350, 1853, 1982, 2460, 2627, 3246, 3253, 3268, 3536, 3846, 3961, 4183, 4667, 6585, 6647, 7273, 9061, 9383, 10428, 10929, 11938, 12033, 12331, 12562, 13793, 14157, 14635, 15265, 15618, 16553, 16604, 18362, 18956, 20075, 21675, 22520, 26130, 26161, 26435, 28279, 29464, 31650, 32302, 32470, 36865, 42863, 47425, 49870, 50254, 50258, 50358, 50359, 50360, 50361, 50362], 'begin_suppress_tokens': [220, 50257]}. You are seeing this warning because you've set generation parameters in the model config, as opposed to in the generation config. warnings.warn( {'loss': 0.2371, 'grad_norm': 2.2187612056732178, 'learning_rate': 9.717142857142858e-06, 'epoch': 0.27} {'loss': 0.2353, 'grad_norm': 2.263890266418457, 'learning_rate': 9.431428571428573e-06, 'epoch': 0.31} {'loss': 0.2335, 'grad_norm': 2.1402018070220947, 'learning_rate': 9.145714285714287e-06, 'epoch': 0.36} {'loss': 0.2297, 'grad_norm': 2.66739821434021, 'learning_rate': 8.860000000000002e-06, 'epoch': 0.4} {'loss': 0.228, 'grad_norm': 2.0314271450042725, 'learning_rate': 8.574285714285714e-06, 'epoch': 0.45} {'eval_loss': 0.22118191421031952, 'eval_wer': 0.13491715560573395, 'eval_runtime': 4977.1354, 'eval_samples_per_second': 0.443, 'eval_steps_per_second': 0.028, 'epoch': 0.45} {'loss': 0.2299, 'grad_norm': 2.114555835723877, 'learning_rate': 8.288571428571429e-06, 'epoch': 0.49} {'loss': 0.2225, 'grad_norm': 2.1076536178588867, 'learning_rate': 8.002857142857143e-06, 'epoch': 0.54} {'loss': 0.2167, 'grad_norm': 2.0616822242736816, 'learning_rate': 7.717142857142857e-06, 'epoch': 0.58} {'loss': 0.219, 'grad_norm': 2.5125436782836914, 'learning_rate': 7.431428571428572e-06, 'epoch': 0.63} {'loss': 0.2222, 'grad_norm': 2.0677897930145264, 'learning_rate': 7.145714285714286e-06, 'epoch': 0.67} {'eval_loss': 0.21581105887889862, 'eval_wer': 0.13260507936126642, 'eval_runtime': 5041.9062, 'eval_samples_per_second': 0.437, 'eval_steps_per_second': 0.027, 'epoch': 0.67} {'loss': 0.214, 'grad_norm': 2.663696765899658, 'learning_rate': 6.860000000000001e-06, 'epoch': 0.72} {'loss': 0.2249, 'grad_norm': 2.4364771842956543, 'learning_rate': 6.574285714285716e-06, 'epoch': 0.76} {'loss': 0.2212, 'grad_norm': 2.2459969520568848, 'learning_rate': 6.288571428571429e-06, 'epoch': 0.81} {'loss': 0.2149, 'grad_norm': 2.0427422523498535, 'learning_rate': 6.0028571428571435e-06, 'epoch': 0.85} {'loss': 0.2166, 'grad_norm': 2.477705955505371, 'learning_rate': 5.717142857142858e-06, 'epoch': 0.9} {'eval_loss': 0.21241351962089539, 'eval_wer': 0.12908591915540155, 'eval_runtime': 5249.1733, 'eval_samples_per_second': 0.42, 'eval_steps_per_second': 0.026, 'epoch': 0.9} {'loss': 0.2147, 'grad_norm': 2.120059013366699, 'learning_rate': 5.431428571428572e-06, 'epoch': 0.94} {'loss': 0.2189, 'grad_norm': 2.2853639125823975, 'learning_rate': 5.145714285714286e-06, 'epoch': 0.99} {'loss': 0.1843, 'grad_norm': 1.7936774492263794, 'learning_rate': 4.862857142857143e-06, 'epoch': 1.03} {'loss': 0.1822, 'grad_norm': 2.1443493366241455, 'learning_rate': 4.577142857142858e-06, 'epoch': 1.08} {'loss': 0.1814, 'grad_norm': 1.898132562637329, 'learning_rate': 4.291428571428572e-06, 'epoch': 1.12} {'eval_loss': 0.21044479310512543, 'eval_wer': 0.12829921269299832, 'eval_runtime': 5329.0455, 'eval_samples_per_second': 0.414, 'eval_steps_per_second': 0.026, 'epoch': 1.12} {'loss': 0.1826, 'grad_norm': 1.8891361951828003, 'learning_rate': 4.0057142857142864e-06, 'epoch': 1.17} {'loss': 0.1784, 'grad_norm': 1.939172387123108, 'learning_rate': 3.7200000000000004e-06, 'epoch': 1.21} {'loss': 0.1761, 'grad_norm': 1.8632012605667114, 'learning_rate': 3.4342857142857143e-06, 'epoch': 1.26} {'loss': 0.1719, 'grad_norm': 2.133723497390747, 'learning_rate': 3.1485714285714287e-06, 'epoch': 1.3} {'loss': 0.177, 'grad_norm': 2.0451295375823975, 'learning_rate': 2.8628571428571435e-06, 'epoch': 1.35} {'eval_loss': 0.20970658957958221, 'eval_wer': 0.12817309944330008, 'eval_runtime': 5114.9291, 'eval_samples_per_second': 0.431, 'eval_steps_per_second': 0.027, 'epoch': 1.35} {'loss': 0.1808, 'grad_norm': 1.9215410947799683, 'learning_rate': 2.5771428571428574e-06, 'epoch': 1.39} {'loss': 0.1808, 'grad_norm': 6.0864033699035645, 'learning_rate': 2.2914285714285718e-06, 'epoch': 1.44} {'loss': 0.1821, 'grad_norm': 2.0357372760772705, 'learning_rate': 2.0057142857142857e-06, 'epoch': 1.48} {'loss': 0.1723, 'grad_norm': 2.021230936050415, 'learning_rate': 1.72e-06, 'epoch': 1.53} {'loss': 0.1754, 'grad_norm': 2.392049551010132, 'learning_rate': 1.4342857142857144e-06, 'epoch': 1.57} {'eval_loss': 0.20813342928886414, 'eval_wer': 0.1277707381228343, 'eval_runtime': 5476.1163, 'eval_samples_per_second': 0.402, 'eval_steps_per_second': 0.025, 'epoch': 1.57} {'loss': 0.1722, 'grad_norm': 1.8462589979171753, 'learning_rate': 1.1485714285714286e-06, 'epoch': 1.62} {'loss': 0.1802, 'grad_norm': 2.2940080165863037, 'learning_rate': 8.628571428571429e-07, 'epoch': 1.66} {'loss': 0.1748, 'grad_norm': 2.065427541732788, 'learning_rate': 5.771428571428572e-07, 'epoch': 1.71} {'loss': 0.1814, 'grad_norm': 2.2169349193573, 'learning_rate': 2.914285714285715e-07, 'epoch': 1.75} {'loss': 0.1808, 'grad_norm': 1.9634981155395508, 'learning_rate': 5.714285714285715e-09, 'epoch': 1.8} {'eval_loss': 0.2075953483581543, 'eval_wer': 0.1275905763375511, 'eval_runtime': 5230.8154, 'eval_samples_per_second': 0.421, 'eval_steps_per_second': 0.026, 'epoch': 1.8} 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4000/4000 [19:33:40<00:00, 10.75s/it]There were missing keys in the checkpoint model loaded: ['proj_out.weight']. {'train_runtime': 70422.0639, 'train_samples_per_second': 1.818, 'train_steps_per_second': 0.057, 'train_loss': 0.2096166067123413, 'epoch': 1.8} 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4000/4000 [19:33:42<00:00, 17.61s/it]

Finished – model saved to C:\Users\Rodo\Desktop\Projects\Dataset-1\whisper-v0.2-EduDataset

Downloads last month
7
Safetensors
Model size
72.6M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Rodolfo98Mendoza/whisper-edu-v0.2

Finetuned
(507)
this model