shb777
/

ceylia-initial

Model card Files Files and versions

shb777 commited on Aug 17

Commit

6cb8072

·

verified ·

1 Parent(s): e0767d9

Update README.md

Files changed (1) hide show

README.md +32 -3

README.md CHANGED Viewed

@@ -1,3 +1,32 @@
----
-license: cc-by-nc-sa-4.0
----

+---
+license: cc-by-nc-sa-4.0
+language:
+- en
+pipeline_tag: text-to-audio
+---
+## ⚠️ Initial Checkpoint
+This is a Piper TTS model finetuned from [Kristin medium](https://huggingface.co/datasets/rhasspy/piper-checkpoints/tree/main/en/en_US/kristin/medium)
+This model is after just 5 epochs on ~30% of total data I curated (synthetic + natural).
+<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/659be8bbb0f43ed69f17e7b8/Z9hoY0Rww7NgYVDK_Gosv.wav"></audio>
+<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/659be8bbb0f43ed69f17e7b8/1hcqStPtTGGCZLvyNvsh3.wav"></audio>
+<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/659be8bbb0f43ed69f17e7b8/HTzdcRaB2VPG283zfA7W3.wav"></audio>
+<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/659be8bbb0f43ed69f17e7b8/8bRegOeimX1A6VCyjQUW-.wav"></audio>
+<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/659be8bbb0f43ed69f17e7b8/-ww0FdqtPPnTwZ2Kasl54.wav"></audio>
+Currently, I'm refining the dataset as I'm not satisfied with its quality. I will resume finetuning after.
+Also running ablations on the best ratio of synthetic and natural data.
+From initial observations it seems like its better to use majority of one kind (90%-10%).
+Trying to push the boundaries of audio generated by a mere 63 MB model.
+## 🙏 Acknowledgements
+[Bryce Beattie](https://brycebeattie.com/files/tts/)
+[Piper TTS](https://github.com/rhasspy/piper)