speechbrain
English
Vocoder
HiFIGAN
speech-synthesis
poonehmousavi commited on
Commit
66f7e7e
·
verified ·
1 Parent(s): 5860393

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +97 -3
README.md CHANGED
@@ -1,3 +1,97 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: "en"
3
+ inference: false
4
+ tags:
5
+ - Vocoder
6
+ - HiFIGAN
7
+ - speech-synthesis
8
+ - speechbrain
9
+ license: "apache-2.0"
10
+ datasets:
11
+ - LibriTTS
12
+ ---
13
+
14
+
15
+ <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
16
+ <br/><br/>
17
+
18
+ # Vocoder with HiFIGAN Unit trained on LibriTTS
19
+
20
+ This repository provides all the necessary tools for using a [scalable HiFiGAN Unit](https://arxiv.org/abs/2406.10735) vocoder trained with [LibriTTS](https://www.openslr.org/141/).
21
+
22
+ The pre-trained model take as input discrete self-supervised representations and produces a waveform as output. This is suitable for a wide range of generative tasks such as speech enhancement, separation, text-to-speech, voice cloning, etc. Please read [DASB - Discrete Audio and Speech Benchmark](https://arxiv.org/abs/2406.14294) for more information.
23
+ To generate the discrete self-supervised representations, we employ a K-means clustering model trained using `microsoft/wavlm-large` hidden layers ([1, 3, 7, 12, 18, 23]), with k=1000.
24
+
25
+ ## Install SpeechBrain
26
+
27
+ First of all, please install tranformers and SpeechBrain with the following command:
28
+
29
+ ```
30
+ pip install speechbrain transformers
31
+ ```
32
+
33
+ Please notice that we encourage you to read our tutorials and learn more about
34
+ [SpeechBrain](https://speechbrain.github.io).
35
+
36
+ ### Using the Vocoder with DiscreteSSL
37
+
38
+ ```python
39
+ import torch
40
+ from speechbrain.lobes.models.huggingface_transformers.hubert import (HuBERT)
41
+
42
+ inputs = torch.rand([3, 2000])
43
+ model_hub = "facebook/hubert-large-ll60k"
44
+ save_path = "savedir"
45
+ ssl_layer_num = [7,23]
46
+ deduplicate =[False, True]
47
+ bpe_tokenizers=[None, None]
48
+ vocoder_repo_id = "speechbrain/hifigan-hubert-k1000-LibriTTS"
49
+ kmeans_dataset = "LibriSpeech"
50
+ num_clusters = 1000
51
+ ssl_model = HuBERT(model_hub, save_path,output_all_hiddens=True)
52
+ model = DiscreteSSL(save_path, ssl_model, vocoder_repo_id=vocoder_repo_id, kmeans_dataset=kmeans_dataset,num_clusters=num_clusters)
53
+ tokens, _, _ = model.encode(inputs,SSL_layers=ssl_layer_num, deduplicates=deduplicate, bpe_tokenizers=bpe_tokenizers)
54
+ sig = model.decode(tokens, ssl_layer_num)
55
+ ```
56
+
57
+
58
+
59
+ ### Standalone Vocoder Usage
60
+
61
+ ```python
62
+ import torch
63
+ from speechbrain.inference.vocoders import UnitHIFIGAN
64
+
65
+ hifi_gan_unit = UnitHIFIGAN.from_hparams(source="speechbrain/hifigan-hubert-k1000-LibriTTS", savedir="pretrained_models/vocoder")
66
+ codes = torch.randint(0, 99, (100, 1))
67
+ waveform = hifi_gan_unit.decode_unit(codes)
68
+
69
+ ```
70
+
71
+
72
+ ### Inference on GPU
73
+ To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
74
+
75
+
76
+ ### Limitations
77
+ The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
78
+
79
+ #### Referencing SpeechBrain
80
+
81
+ ```
82
+ @misc{SB2021,
83
+ author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
84
+ title = {SpeechBrain},
85
+ year = {2021},
86
+ publisher = {GitHub},
87
+ journal = {GitHub repository},
88
+ howpublished = {\\\\url{https://github.com/speechbrain/speechbrain}},
89
+ }
90
+ ```
91
+
92
+ #### About SpeechBrain
93
+ SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to be simple, extremely flexible, and user-friendly. Competitive or state-of-the-art performance is obtained in various domains.
94
+
95
+ Website: https://speechbrain.github.io/
96
+
97
+ GitHub: https://github.com/speechbrain/speechbrain