File size: 3,077 Bytes
ad982bc 7fba73f 9dcee4c 71786c9 14ec682 71786c9 9dcee4c 14ec682 9dcee4c 71786c9 9dcee4c 14ec682 9dcee4c 14ec682 9dcee4c 14ec682 9dcee4c b6d0eb0 9dcee4c 71786c9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
---
license: apache-2.0
---
<!--  -->
## 👉🏻 WenetSpeech-Yue 👈🏻
**WenetSpeech-Yue**: [Demos](https://aslp-lab.github.io/WenetSpeech-Yue/); [Paper](https://arxiv.org/abs/2509.03959); [Github](https://github.com/ASLP-lab/WenetSpeech-Yue); [HuggingFace](https://huggingface.co/datasets/ASLP-lab/WenetSpeech-Yue)
## Highlight🔥
**WenetSpeech-Yue TTS Models** have been released!
This repository contains two versions of the TTS models:
1. **ASLP-lab/Cosyvoice2-Yue**: The base model for Cantonese TTS.
2. **ASLP-lab/Cosyvoice2-Yue-ZoengJyutGaai**: A fine-tuned, higher-quality version for more natural speech generation.
## Roadmap
- [x] 2025/9
- [x] 25hz WenetSpeech-Yue TTS models released
## Install
**Clone and install**
- Clone the repo
``` sh
git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git
# If you failed to clone submodule due to network failures, please run following command until success
cd CosyVoice
git submodule update --init --recursive
```
- Install Conda: please see https://docs.conda.io/en/latest/miniconda.html
- Create Conda env:
``` sh
conda create -n cosyvoice python=3.10
conda activate cosyvoice
# pynini is required by WeTextProcessing, use conda to install it as it can be executed on all platform.
conda install -y -c conda-forge pynini==2.1.5
pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com
# If you encounter sox compatibility issues
# ubuntu
sudo apt-get install sox libsox-dev
# centos
sudo yum install sox sox-devel
```
**Model download**
1. [Cosyvoice2-Yue](https://huggingface.co/ASLP-lab/Cosyvoice2-Yue)
2. [Cosyvoice2-Yue-ZoengJyutGaai](https://huggingface.co/ASLP-lab/Cosyvoice2-Yue-ZoengJyutGaai)
**Basic Usage**
We strongly recommend using `CosyVoice2-0.5B` for better performance.
Follow code below for detailed usage of each model.
``` python
import sys
sys.path.append('third_party/Matcha-TTS')
from cosyvoice.cli.cosyvoice import CosyVoice, CosyVoice2
from cosyvoice.utils.file_utils import load_wav
import torchaudio
```
**CosyVoice2 Usage**
```python
cosyvoice = CosyVoice2('ASLP-lab/Cosyvoice2-Yue', load_jit=False, load_trt=False, fp16=False)
# NOTE if you want to reproduce the results on https://funaudiollm.github.io/cosyvoice2, please add text_frontend=False during inference
# zero_shot usage
prompt_speech_16k = load_wav('zero_shot_prompt.wav', 16000)
# instruct usage
for i, j in enumerate(cosyvoice.inference_instruct2('收到朋友从远方寄嚟嘅生日礼物,呢份意外嘅惊喜同埋满满嘅祝福令我内心充满咗甜蜜嘅快乐,个笑容就好似花咁咧盛开住。', '用粤语说这句话', prompt_speech_16k, stream=False)):
torchaudio.save('instruct_{}.wav'.format(i), j['tts_speech'], cosyvoice.sample_rate)
```
## Contact
If you are interested in leaving a message to our research team, feel free to email [email protected] or [email protected]. |