|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- simon3000/genshin-voice |
|
language: |
|
- zh |
|
base_model: |
|
- SparkAudio/Spark-TTS-0.5B |
|
pipeline_tag: text-to-speech |
|
tags: |
|
- chinese |
|
- spark-tts |
|
- genshin |
|
- float16 |
|
--- |
|
# Spark TTS finetuned in genshin charactors voices. |
|
* github code: https://github.com/nonwesjoe/genshin-sparktts |
|
* kaggle notebook: https://www.kaggle.com/code/suziwsz/genshin-sparktts/ |
|
## Available charactors |
|
* paimon, hutao, furina, kazuha, xiao, mona, ganyu, xiangling, shotgun, citlali, barbara, zhongli, venti, nahida, kaeya, yaoyao, yoimiya, nilou.(each charactor in one full finetuned model) |
|
# Usage |
|
* python 3.12 suggested |
|
* git clone https://github.com/nonwesjoe/genshin-sparktts.git && cd genshin-sparktts |
|
* when cuda is availabel, install torch 2.7.1 on cuda <code>pip install torch torchaudio torchvision -i https://download.pytorch.org/whl/cu118/</code> |
|
else, install torch 2.7.1 on cpu <code>pip install torch torchaudio torchvision -i https://download.pytorch.org/whl/cpu</code> |
|
* install other requirements <code>pip install -r requirements.txt</code> |
|
* in terminal set some environment variables |
|
``` |
|
export CHARACTOR=nahida # or other charactors |
|
export MODEL_PATH=/kaggle/working/genshin/ # your model path |
|
export INPUT_TEXT="楼下发荔枝了吗?那我们快去领取!" # text to be converted |
|
``` |
|
* download model files: defaultly, download one specific charactor model set in environment variable CHARACTOR. model will be download in ./genshin |
|
<code> |
|
python3 download.py |
|
</code> |
|
* run code to convert text to audio. audio outputs sparktts.wav. |
|
<code> |
|
python3 run.py |
|
</code> |
|
# Detail |
|
* this model is trianed on float32 but saved as float16 for less VRAM and Storage usage. |
|
# Example |
|
[▶ Furina(芙宁娜) play](https://raw.githubusercontent.com/nonwesjoe/genshin-sparktts/main/examples/furina.wav) |
|
[▶ Kazuha(万叶) play](https://raw.githubusercontent.com/nonwesjoe/genshin-sparktts/main/examples/kazuha.wav) |
|
[▶ Paimon(派蒙) play](https://raw.githubusercontent.com/nonwesjoe/genshin-sparktts/main/examples/paimon.wav) |
|
[▶ Hutao(胡桃) play](https://raw.githubusercontent.com/nonwesjoe/genshin-sparktts/main/examples/hutao.wav) |
|
[▶ Xiao(魈) play](https://raw.githubusercontent.com/nonwesjoe/genshin-sparktts/main/examples/xiao.wav) |
|
[▶ Citlali(茜特菈莉) play](https://raw.githubusercontent.com/nonwesjoe/genshin-sparktts/main/examples/citlali.wav) |
|
# Acknowledgement |
|
* [Spark-TTS](https://github.com/SparkAudio/Spark-TTS) |
|
* [Genshin dataset](https://huggingface.co/datasets/simon3000/genshin-voice) |
|
* [Unsloth](https://github.com/unslothai/unsloth) |