w2v-bert-2.0-hausa_250_250h

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 160
eval_batch_size: 160
seed: 42
distributed_type: multi-GPU
num_devices: 2
total_train_batch_size: 320
total_eval_batch_size: 320
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 50.0
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Cer	Validation Loss	Wer
0.5401	0.6406	1000	0.2055	0.3666	0.3864
0.1643	1.2812	2000	0.1978	0.3028	0.3591
0.0957	1.9218	3000	0.1956	0.2884	0.3526
0.0689	2.5625	4000	0.1948	0.2853	0.3522
0.2318	3.2031	5000	0.1992	0.2871	0.3680
0.1863	3.8437	6000	0.1985	0.2880	0.3629
0.0662	4.4843	7000	0.2037	0.3047	0.3826
0.234	5.1249	8000	0.1978	0.2872	0.3585
0.2175	5.7655	9000	0.1969	0.2786	0.3546
0.0557	6.4061	10000	0.2873	0.3668	0.2017
0.1808	7.0468	11000	0.2740	0.3486	0.1956
0.2526	7.6874	12000	0.2779	0.3553	0.1970
0.0698	8.3280	13000	0.2765	0.3520	0.1969
0.1459	8.9686	14000	0.2823	0.3546	0.1965
0.1818	9.6092	15000	0.2699	0.3441	0.1942
0.1141	10.2498	16000	0.2737	0.3515	0.1965
0.0851	10.8905	17000	0.2654	0.3494	0.1957
0.0612	11.5311	18000	0.2636	0.3478	0.1946
0.1456	12.1717	19000	0.2618	0.3431	0.1937
0.1322	12.8123	20000	0.2659	0.3495	0.1952
0.0377	13.4529	21000	0.2696	0.3462	0.1950
0.1161	14.0935	22000	0.2655	0.3435	0.1943
0.1446	14.7341	23000	0.2561	0.3418	0.1931
0.059	15.3748	24000	0.2668	0.3447	0.1937
0.1723	16.0154	25000	0.2654	0.3410	0.1940
0.1659	16.6560	26000	0.2635	0.3461	0.1947
0.0688	17.2966	27000	0.2602	0.3416	0.1928
0.0738	17.9372	28000	0.2732	0.3433	0.1936