|
--- |
|
license: cc-by-nc-sa-4.0 |
|
language: |
|
- en |
|
pipeline_tag: text-to-speech |
|
library_name: onnx |
|
tags: |
|
- piper |
|
- tts |
|
datasets: |
|
- Jinsaryko/Elise |
|
--- |
|
|
|
## ⚠️ Initial Checkpoint |
|
|
|
This is a Piper TTS model finetuned from [Kristin medium](https://huggingface.co/datasets/rhasspy/piper-checkpoints/tree/main/en/en_US/kristin/medium) |
|
|
|
This model is after just 5 epochs on ~30% of total data I curated (synthetic + natural). |
|
|
|
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/659be8bbb0f43ed69f17e7b8/Z9hoY0Rww7NgYVDK_Gosv.wav"></audio> |
|
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/659be8bbb0f43ed69f17e7b8/1hcqStPtTGGCZLvyNvsh3.wav"></audio> |
|
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/659be8bbb0f43ed69f17e7b8/HTzdcRaB2VPG283zfA7W3.wav"></audio> |
|
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/659be8bbb0f43ed69f17e7b8/8bRegOeimX1A6VCyjQUW-.wav"></audio> |
|
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/659be8bbb0f43ed69f17e7b8/-ww0FdqtPPnTwZ2Kasl54.wav"></audio> |
|
|
|
Currently, I'm refining the synthetic dataset as I'm not satisfied with its quality. I will resume finetuning after. |
|
|
|
Also running ablations on the best ratio of synthetic and natural data. |
|
|
|
From initial observations it seems like its better to use majority of one kind (90%-10%). |
|
|
|
Trying to push the boundaries of audio generated by a mere 63 MB model. |
|
|
|
## Inference |
|
|
|
```python |
|
import wave |
|
|
|
from src.python_run.piper import PiperVoice # Or import from the installed package if you used pip |
|
|
|
model = PiperVoice.load("en_US-ceylia-medium.onnx") |
|
|
|
text = "I have a big plan for today. It involves fine-tuning you." |
|
|
|
with wave.open("output.wav", "wb") as output_file: |
|
output_file.setnchannels(1) |
|
output_file.setsampwidth(2) |
|
output_file.setframerate(22050) |
|
model.synthesize(text=text, wav_file=output_file, sentence_silence=0.25) |
|
``` |
|
|
|
## 🙏 Acknowledgements |
|
|
|
[Bryce Beattie](https://brycebeattie.com/files/tts/) for training the Kristin model. |
|
|
|
Reference Audio from datasets by [@Jinsaryko](https://huggingface.co/Jinsaryko) |
|
|
|
[Piper TTS](https://github.com/rhasspy/piper) |