shb777 commited on
Commit
6cb8072
·
verified ·
1 Parent(s): e0767d9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -3
README.md CHANGED
@@ -1,3 +1,32 @@
1
- ---
2
- license: cc-by-nc-sa-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-sa-4.0
3
+ language:
4
+ - en
5
+ pipeline_tag: text-to-audio
6
+ ---
7
+
8
+ ## ⚠️ Initial Checkpoint
9
+
10
+ This is a Piper TTS model finetuned from [Kristin medium](https://huggingface.co/datasets/rhasspy/piper-checkpoints/tree/main/en/en_US/kristin/medium)
11
+
12
+ This model is after just 5 epochs on ~30% of total data I curated (synthetic + natural).
13
+
14
+ <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/659be8bbb0f43ed69f17e7b8/Z9hoY0Rww7NgYVDK_Gosv.wav"></audio>
15
+ <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/659be8bbb0f43ed69f17e7b8/1hcqStPtTGGCZLvyNvsh3.wav"></audio>
16
+ <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/659be8bbb0f43ed69f17e7b8/HTzdcRaB2VPG283zfA7W3.wav"></audio>
17
+ <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/659be8bbb0f43ed69f17e7b8/8bRegOeimX1A6VCyjQUW-.wav"></audio>
18
+ <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/659be8bbb0f43ed69f17e7b8/-ww0FdqtPPnTwZ2Kasl54.wav"></audio>
19
+
20
+ Currently, I'm refining the dataset as I'm not satisfied with its quality. I will resume finetuning after.
21
+
22
+ Also running ablations on the best ratio of synthetic and natural data.
23
+
24
+ From initial observations it seems like its better to use majority of one kind (90%-10%).
25
+
26
+ Trying to push the boundaries of audio generated by a mere 63 MB model.
27
+
28
+ ## 🙏 Acknowledgements
29
+
30
+ [Bryce Beattie](https://brycebeattie.com/files/tts/)
31
+
32
+ [Piper TTS](https://github.com/rhasspy/piper)