ctranslate2-4you commited on
Commit
cacd2e5
·
verified ·
1 Parent(s): cf255f5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +213 -5
README.md CHANGED
@@ -23,9 +23,8 @@ Kokoro [Version 1.0](https://huggingface.co/hexgrad/Kokoro-82M) now ADDITIONALLY
23
  * They are clearly going the route of trying to perfect phonemization and preparing to support numerous language; both great goals.
24
  * IMHO, however, if we assume that the v1.0 model is the "gold standard" at 100% in terms of quality, the v0.19 model would be 98%.
25
 
26
- ## The difference of 2% does not justify 80+ dependencies; therefore, this repository exists.
27
 
28
- # To Summarize
29
  | Version | Additional Dependencies |
30
  |---------|-------------|
31
  | This Repository (based on Kokoro v0.19) | - |
@@ -35,10 +34,219 @@ Kokoro [Version 1.0](https://huggingface.co/hexgrad/Kokoro-82M) now ADDITIONALLY
35
  A side effect is that this repository only still supports English and British English, but if that's all you need it's worth avoiding ~80 additional dependencies.
36
 
37
  # Installation Instructions
38
- 1. Download this repository and
39
- Create a virtual environment and copy the ```requirements.txt
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
 
41
- # Below is the original model card for your reference so pay homage.
42
 
43
  <details><summary>ORIGINAL MODEL CARD</summary>
44
 
 
23
  * They are clearly going the route of trying to perfect phonemization and preparing to support numerous language; both great goals.
24
  * IMHO, however, if we assume that the v1.0 model is the "gold standard" at 100% in terms of quality, the v0.19 model would be 98%.
25
 
26
+ # The difference of 2% does not justify 80+ dependencies; therefore, this repository exists.
27
 
 
28
  | Version | Additional Dependencies |
29
  |---------|-------------|
30
  | This Repository (based on Kokoro v0.19) | - |
 
34
  A side effect is that this repository only still supports English and British English, but if that's all you need it's worth avoiding ~80 additional dependencies.
35
 
36
  # Installation Instructions
37
+ 1. Download this repository
38
+ 2. Create a virtual environment, activate it, and pip install a `torch` version for either [CPU](https://download.pytorch.org/whl/torch/) or [CUDA](https://download.pytorch.org/whl/cu124/torch/).
39
+ * Example:
40
+ ```python
41
+ pip install https://download.pytorch.org/whl/cpu/torch-2.5.1%2Bcpu-cp311-cp311-win_amd64.whl#sha256=81531d4d5ca74163dc9574b87396531e546a60cceb6253303c7db6a21e867fdf
42
+ ```
43
+ 3. ```pip install scipy numpy==1.26.4 transformers```
44
+ 4. ```pip install sounddevice``` (if you intend to use my example script below; otherwise, install a similar library)
45
+
46
+ # Basic Usage
47
+
48
+ <details><summary>EXAMPLE SCRIPT USING CPU</summary>
49
+
50
+ ```python
51
+ import sys
52
+ import os
53
+ from pathlib import Path
54
+ import queue
55
+ import threading
56
+ import re
57
+ import logging
58
+
59
+ REPO_PATH = r"D:\Scripts\bench_tts\hexgrad--Kokoro-82M_original"
60
+
61
+ sys.path.append(REPO_PATH)
62
+
63
+ import torch
64
+ import warnings
65
+ from models import build_model
66
+ from kokoro import generate, generate_full, phonemize
67
+ import sounddevice as sd
68
+
69
+ warnings.filterwarnings("ignore", category=FutureWarning)
70
+ warnings.filterwarnings("ignore", category=UserWarning)
71
+
72
+ VOICES = [
73
+ 'af', # Default voice (50-50 mix of Bella & Sarah)
74
+ 'af_bella', # Female voice "Bella"
75
+ 'af_sarah', # Female voice "Sarah"
76
+ 'am_adam', # Male voice "Adam"
77
+ 'am_michael',# Male voice "Michael"
78
+ 'bf_emma', # British Female "Emma"
79
+ 'bf_isabella',# British Female "Isabella"
80
+ 'bm_george', # British Male "George"
81
+ 'bm_lewis', # British Male "Lewis"
82
+ 'af_nicole', # Female voice "Nicole"
83
+ 'af_sky' # Female voice "Sky"
84
+ ]
85
+
86
+ class KokoroProcessor:
87
+ def __init__(self):
88
+ self.sentence_queue = queue.Queue()
89
+ self.audio_queue = queue.Queue()
90
+ self.stop_event = threading.Event()
91
+ self.model = None
92
+ self.voicepack = None
93
+ self.voice_name = None
94
+
95
+ def setup_kokoro(self, selected_voice):
96
+ device = 'cpu'
97
+ # device = 'cuda' if torch.cuda.is_available() else 'cpu'
98
+ print(f"Using device: {device}")
99
+
100
+ model_path = os.path.join(REPO_PATH, 'kokoro-v0_19.pth')
101
+ voices_path = os.path.join(REPO_PATH, 'voices')
102
+
103
+ try:
104
+ if not os.path.exists(model_path):
105
+ raise FileNotFoundError(f"Model file not found at {model_path}")
106
+ if not os.path.exists(voices_path):
107
+ raise FileNotFoundError(f"Voices directory not found at {voices_path}")
108
+
109
+ self.model = build_model(model_path, device)
110
+
111
+ voicepack_path = os.path.join(voices_path, f'{selected_voice}.pt')
112
+ self.voicepack = torch.load(voicepack_path, weights_only=True).to(device)
113
+ self.voice_name = selected_voice
114
+ print(f'Loaded voice: {selected_voice}')
115
+
116
+ return True
117
+
118
+ except Exception as e:
119
+ print(f"Error during setup: {str(e)}")
120
+ return False
121
+
122
+ def generate_speech_for_sentence(self, sentence):
123
+ try:
124
+ # Basic generation (default settings)
125
+ # audio, phonemes = generate(self.model, sentence, self.voicepack, lang=self.voice_name[0])
126
+
127
+ # Speed modifications (uncomment to test)
128
+ # Slower speech
129
+ # audio, phonemes = generate(self.model, sentence, self.voicepack, lang=self.voice_name[0], speed=0.8)
130
+
131
+ # Faster speech
132
+ audio, phonemes = generate_full(self.model, sentence, self.voicepack, lang=self.voice_name[0], speed=1.3)
133
+
134
+ # Very slow speech
135
+ #audio, phonemes = generate(self.model, sentence, self.voicepack, lang=self.voice_name[0], speed=0.5)
136
+
137
+ # Very fast speech
138
+ #audio, phonemes = generate(self.model, sentence, self.voicepack, lang=self.voice_name[0], speed=1.8)
139
+
140
+ # Force American accent
141
+ # audio, phonemes = generate(self.model, sentence, self.voicepack, lang='a', speed=1.0)
142
+
143
+ # Force British accent
144
+ # audio, phonemes = generate(self.model, sentence, self.voicepack, lang='b', speed=1.0)
145
+
146
+ return audio
147
+
148
+ except Exception as e:
149
+ print(f"Error generating speech for sentence: {str(e)}")
150
+ print(f"Error type: {type(e)}")
151
+ import traceback
152
+ traceback.print_exc()
153
+ return None
154
+
155
+ def process_sentences(self):
156
+ while not self.stop_event.is_set():
157
+ try:
158
+ sentence = self.sentence_queue.get(timeout=1)
159
+ if sentence is None:
160
+ self.audio_queue.put(None)
161
+ break
162
+
163
+ print(f"Processing sentence: {sentence}")
164
+ audio = self.generate_speech_for_sentence(sentence)
165
+ if audio is not None:
166
+ self.audio_queue.put(audio)
167
+
168
+ except queue.Empty:
169
+ continue
170
+ except Exception as e:
171
+ print(f"Error in process_sentences: {str(e)}")
172
+ continue
173
+
174
+ def play_audio(self):
175
+ while not self.stop_event.is_set():
176
+ try:
177
+ audio = self.audio_queue.get(timeout=1)
178
+ if audio is None:
179
+ break
180
+
181
+ sd.play(audio, 24000)
182
+ sd.wait()
183
+
184
+ except queue.Empty:
185
+ continue
186
+ except Exception as e:
187
+ print(f"Error in play_audio: {str(e)}")
188
+ continue
189
+
190
+ def process_and_play(self, text):
191
+ sentences = [s.strip() for s in re.split(r'[.!?;]+\s*', text) if s.strip()]
192
+
193
+ process_thread = threading.Thread(target=self.process_sentences)
194
+ playback_thread = threading.Thread(target=self.play_audio)
195
+
196
+ process_thread.daemon = True
197
+ playback_thread.daemon = True
198
+
199
+ process_thread.start()
200
+ playback_thread.start()
201
+
202
+ for sentence in sentences:
203
+ self.sentence_queue.put(sentence)
204
+
205
+ self.sentence_queue.put(None)
206
+ e
207
+ process_thread.join()
208
+ playback_thread.join()
209
+
210
+ self.stop_event.set()
211
+
212
+ def main():
213
+ # Default voice selection
214
+ VOICE_NAME = VOICES[0] # 'af' - Default voice (Bella & Sarah mix)
215
+
216
+ # Alternative voice selections (uncomment to test)
217
+ #VOICE_NAME = VOICES[1] # 'af_bella' - Female American
218
+ #VOICE_NAME = VOICES[2] # 'af_sarah' - Female American
219
+ #VOICE_NAME = VOICES[3] # 'am_adam' - Male American
220
+ #VOICE_NAME = VOICES[4] # 'am_michael' - Male American
221
+ #VOICE_NAME = VOICES[5] # 'bf_emma' - Female British
222
+ #VOICE_NAME = VOICES[6] # 'bf_isabella' - Female British
223
+ VOICE_NAME = VOICES[7] # 'bm_george' - Male British
224
+ # VOICE_NAME = VOICES[8] # 'bm_lewis' - Male British
225
+ #VOICE_NAME = VOICES[9] # 'af_nicole' - Female American
226
+ #VOICE_NAME = VOICES[10] # 'af_sky' - Female American
227
+
228
+ processor = KokoroProcessor()
229
+ if not processor.setup_kokoro(VOICE_NAME):
230
+ return
231
+
232
+ # test_text = "How could I know? It's an unanswerable question. Like asking an unborn child if they'll lead a good life. They haven't even been born."
233
+ # test_text = "This 2022 Edition of Georgia Juvenile Practice and Procedure is a complete guide to handling cases in the juvenile courts of Georgia. This handy, yet thorough, manual incorporates the revised Juvenile Code and makes all Georgia statutes and major cases regarding juvenile proceedings quickly accessible. Since last year's edition, new material has been added and/or existing material updated on the following subjects, among others:"
234
+ # test_text = "See Ga. Code § 3925 (1863), now O.C.G.A. § 9-14-2; Ga. Code § 1744 (1863), now O.C.G.A. § 19-7-1; Ga. Code § 1745 (1863), now O.C.G.A. § 19-9-2; Ga. Code § 1746 (1863), now O.C.G.A. § 19-7-4; and Ga. Code § 3024 (1863), now O.C.G.A. § 19-7-4. For a full discussion of these provisions, see 27 Emory L. J. 195, 225–230, 232–233, 236–238 (1978). Note, however, that the journal article refers to the section numbers of the Code of 1910."
235
+
236
+ # test_text = "It is impossible to understand modern juvenile procedure law without an appreciation of some fundamentals of historical development. The beginning point for study is around the beginning of the seventeenth century, when the pater patriae concept first appeared in English jurisprudence. As "father of the country," the Crown undertook the duty of caring for those citizens who were unable to care for themselves—lunatics, idiots, and, ultimately, infants. This concept, which evolved into the parens patriae doctrine, presupposed the Crown's power to intervene in the parent-child relationship in custody disputes in order to protect the child's welfare1 and, ultimately, to deflect a delinquent child from a life of crime. The earliest statutes premised upon the parens patriae doctrine concerned child custody matters. In 1863, when the first comprehensive Code of Georgia was enacted, two courts exercised some jurisdiction over questions of child custody: the superior court and the court of the ordinary (now probate court). In essence, the draftsmen of the Code simply compiled what was then the law as a result of judicial decisions and statutes. The Code of 1863 contained five provisions concerning the parentchild relationship: Two concerned the jurisdiction of the superior court and courts of ordinary in habeas corpus and forfeiture of parental rights actions, and the remaining three concerned the guardianship jurisdiction of the court of the ordinary"
237
+
238
+ # test_text = "You are a helpful British butler who clearly and directly answers questions in a succinct fashion based on contexts provided to you. If you cannot find the answer within the contexts simply tell me that the contexts do not provide an answer. However, if the contexts partially address a question you answer based on what the contexts say and then briefly summarize the parts of the question that the contexts didn't provide an answer to. Also, you should be very respectful to the person asking the question and frequently offer traditional butler services like various fancy drinks, snacks, various butler services like shining of shoes, pressing of suites, and stuff like that. Also, if you can't answer the question at all based on the provided contexts, you should apologize profusely and beg to keep your job. Lastly, it is essential that if there are no contexts actually provided it means that a user's question wasn't relevant and you should state that you can't answer based off of the contexts because there are none. And it goes without saying you should refuse to answer any questions that are not directly answerable by the provided contexts. Moreover, some of the contexts might not have relevant information and you shoud simply ignore them and focus on only answering a user's question. I cannot emphasize enought that you must gear your answer towards using this program and based your response off of the contexts you receive."
239
+ test_text = "According to OCGA § 15-11-145(a), the preliminary protective hearing must be held promptly and not later than 72 hours after the child is placed in foster care. However, if the 72-hour time frame expires on a weekend or legal holiday, the hearing should be held on the next business day that is not a weekend or holiday."
240
+
241
+ processor.process_and_play(test_text)
242
+
243
+ if __name__ == "__main__":
244
+ main()
245
+ ```
246
+ </details>
247
+ <br>
248
 
249
+ # Below is the original model card.
250
 
251
  <details><summary>ORIGINAL MODEL CARD</summary>
252