SpeechT5-it

This model is a fine-tuned version of microsoft/speecht5_tts on the VOXPOPULI dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss
0.5641	1.0	712	0.5090
0.5394	2.0	1424	0.4915
0.5277	3.0	2136	0.4819
0.5136	4.0	2848	0.4798
0.5109	5.0	3560	0.4733
0.5078	6.0	4272	0.4731
0.5033	7.0	4984	0.4692
0.5021	8.0	5696	0.4691
0.4984	9.0	6408	0.4670
0.488	10.0	7120	0.4641
0.491	11.0	7832	0.4641
0.4918	12.0	8544	0.4647
0.4933	13.0	9256	0.4622
0.499	14.0	9968	0.4619
0.4906	15.0	10680	0.4608
0.4884	16.0	11392	0.4622
0.4847	17.0	12104	0.4616
0.4916	18.0	12816	0.4592
0.4845	19.0	13528	0.4600
0.4788	20.0	14240	0.4594
0.4746	21.0	14952	0.4607
0.4875	22.0	15664	0.4615
0.4831	23.0	16376	0.4597
0.4798	24.0	17088	0.4595
0.4727	25.0	17800	0.4592
0.4736	26.0	18512	0.4598
0.4746	27.0	19224	0.4608
0.4728	28.0	19936	0.4589
0.4771	29.0	20648	0.4593
0.4743	30.0	21360	0.4588
0.4785	31.0	22072	0.4601
0.4757	32.0	22784	0.4597
0.4731	33.0	23496	0.4598
0.4746	34.0	24208	0.4593
0.4715	35.0	24920	0.4599
0.4769	36.0	25632	0.4622
0.4778	37.0	26344	0.4605
0.4798	38.0	27056	0.4594
0.4694	39.0	27768	0.4607
0.468	40.0	28480	0.4600

Safetensors

Model size

0.1B params

Tensor type

F32

Base model

Finetuned

this model