ctranslate2-4you
/

Kokoro-82M-light

Text-to-Speech

English

Model card Files Files and versions

xet

Community

ctranslate2-4you commited on Jan 28

Commit

cacd2e5

verified ·

1 Parent(s): cf255f5

Update README.md

Browse files

Files changed (1) hide show

README.md +213 -5

README.md CHANGED Viewed

@@ -23,9 +23,8 @@ Kokoro [Version 1.0](https://huggingface.co/hexgrad/Kokoro-82M) now ADDITIONALLY
   * They are clearly going the route of trying to perfect phonemization and preparing to support numerous language; both great goals.
   * IMHO, however, if we assume that the v1.0 model is the "gold standard" at 100% in terms of quality, the v0.19 model would be 98%.
-## The difference of 2% does not justify 80+ dependencies; therefore, this repository exists.
-# To Summarize
 | Version | Additional Dependencies |
 |---------|-------------|
 | This Repository (based on Kokoro v0.19) | - |
@@ -35,10 +34,219 @@ Kokoro [Version 1.0](https://huggingface.co/hexgrad/Kokoro-82M) now ADDITIONALLY
 A side effect is that this repository only still supports English and British English, but if that's all you need it's worth avoiding ~80 additional dependencies.
 # Installation Instructions
-1. Download this repository and
-Create a virtual environment and copy the ```requirements.txt
-# Below is the original model card for your reference so pay homage.
 <details><summary>ORIGINAL MODEL CARD</summary>

   * They are clearly going the route of trying to perfect phonemization and preparing to support numerous language; both great goals.
   * IMHO, however, if we assume that the v1.0 model is the "gold standard" at 100% in terms of quality, the v0.19 model would be 98%.
+# The difference of 2% does not justify 80+ dependencies; therefore, this repository exists.
 | Version | Additional Dependencies |
 |---------|-------------|
 | This Repository (based on Kokoro v0.19) | - |
 A side effect is that this repository only still supports English and British English, but if that's all you need it's worth avoiding ~80 additional dependencies.
 # Installation Instructions
+1. Download this repository
+2. Create a virtual environment, activate it, and pip install a `torch` version for either [CPU](https://download.pytorch.org/whl/torch/) or [CUDA](https://download.pytorch.org/whl/cu124/torch/).
+  * Example:
+```python
+pip install https://download.pytorch.org/whl/cpu/torch-2.5.1%2Bcpu-cp311-cp311-win_amd64.whl#sha256=81531d4d5ca74163dc9574b87396531e546a60cceb6253303c7db6a21e867fdf
+```
+3. ```pip install scipy numpy==1.26.4 transformers```
+4. ```pip install sounddevice``` (if you intend to use my example script below; otherwise, install a similar library)
+# Basic Usage
+<details><summary>EXAMPLE SCRIPT USING CPU</summary>
+```python
+import sys
+import os
+from pathlib import Path
+import queue
+import threading
+import re
+import logging
+REPO_PATH = r"D:\Scripts\bench_tts\hexgrad--Kokoro-82M_original"
+sys.path.append(REPO_PATH)
+import torch
+import warnings
+from models import build_model
+from kokoro import generate, generate_full, phonemize
+import sounddevice as sd
+warnings.filterwarnings("ignore", category=FutureWarning)
+warnings.filterwarnings("ignore", category=UserWarning)
+VOICES = [
+   'af',        # Default voice (50-50 mix of Bella & Sarah)
+   'af_bella',  # Female voice "Bella"
+   'af_sarah',  # Female voice "Sarah"
+   'am_adam',   # Male voice "Adam"
+   'am_michael',# Male voice "Michael"
+   'bf_emma',   # British Female "Emma"
+   'bf_isabella',# British Female "Isabella"
+   'bm_george', # British Male "George"
+   'bm_lewis',  # British Male "Lewis"
+   'af_nicole', # Female voice "Nicole"
+   'af_sky'     # Female voice "Sky"
+]
+class KokoroProcessor:
+   def __init__(self):
+       self.sentence_queue = queue.Queue()
+       self.audio_queue = queue.Queue()
+       self.stop_event = threading.Event()
+       self.model = None
+       self.voicepack = None
+       self.voice_name = None
+   def setup_kokoro(self, selected_voice):
+       device = 'cpu'
+       # device = 'cuda' if torch.cuda.is_available() else 'cpu'
+       print(f"Using device: {device}")
+       model_path = os.path.join(REPO_PATH, 'kokoro-v0_19.pth')
+       voices_path = os.path.join(REPO_PATH, 'voices')
+       try:
+           if not os.path.exists(model_path):
+               raise FileNotFoundError(f"Model file not found at {model_path}")
+           if not os.path.exists(voices_path):
+               raise FileNotFoundError(f"Voices directory not found at {voices_path}")
+           self.model = build_model(model_path, device)
+           voicepack_path = os.path.join(voices_path, f'{selected_voice}.pt')
+           self.voicepack = torch.load(voicepack_path, weights_only=True).to(device)
+           self.voice_name = selected_voice
+           print(f'Loaded voice: {selected_voice}')
+           return True
+       except Exception as e:
+           print(f"Error during setup: {str(e)}")
+           return False
+   def generate_speech_for_sentence(self, sentence):
+       try:
+           # Basic generation (default settings)
+           # audio, phonemes = generate(self.model, sentence, self.voicepack, lang=self.voice_name[0])
+           # Speed modifications (uncomment to test)
+           # Slower speech
+           # audio, phonemes = generate(self.model, sentence, self.voicepack, lang=self.voice_name[0], speed=0.8)
+           # Faster speech
+           audio, phonemes = generate_full(self.model, sentence, self.voicepack, lang=self.voice_name[0], speed=1.3)
+           # Very slow speech
+           #audio, phonemes = generate(self.model, sentence, self.voicepack, lang=self.voice_name[0], speed=0.5)
+           # Very fast speech
+           #audio, phonemes = generate(self.model, sentence, self.voicepack, lang=self.voice_name[0], speed=1.8)
+           # Force American accent
+           # audio, phonemes = generate(self.model, sentence, self.voicepack, lang='a', speed=1.0)
+           # Force British accent
+           # audio, phonemes = generate(self.model, sentence, self.voicepack, lang='b', speed=1.0)
+           return audio
+       except Exception as e:
+           print(f"Error generating speech for sentence: {str(e)}")
+           print(f"Error type: {type(e)}")
+           import traceback
+           traceback.print_exc()
+           return None
+   def process_sentences(self):
+       while not self.stop_event.is_set():
+           try:
+               sentence = self.sentence_queue.get(timeout=1)
+               if sentence is None:
+                   self.audio_queue.put(None)
+                   break
+               print(f"Processing sentence: {sentence}")
+               audio = self.generate_speech_for_sentence(sentence)
+               if audio is not None:
+                   self.audio_queue.put(audio)
+           except queue.Empty:
+               continue
+           except Exception as e:
+               print(f"Error in process_sentences: {str(e)}")
+               continue
+   def play_audio(self):
+       while not self.stop_event.is_set():
+           try:
+               audio = self.audio_queue.get(timeout=1)
+               if audio is None:
+                   break
+               sd.play(audio, 24000)
+               sd.wait()
+           except queue.Empty:
+               continue
+           except Exception as e:
+               print(f"Error in play_audio: {str(e)}")
+               continue
+   def process_and_play(self, text):
+       sentences = [s.strip() for s in re.split(r'[.!?;]+\s*', text) if s.strip()]
+       process_thread = threading.Thread(target=self.process_sentences)
+       playback_thread = threading.Thread(target=self.play_audio)
+       process_thread.daemon = True
+       playback_thread.daemon = True
+       process_thread.start()
+       playback_thread.start()
+       for sentence in sentences:
+           self.sentence_queue.put(sentence)
+       self.sentence_queue.put(None)
+e
+       process_thread.join()
+       playback_thread.join()
+       self.stop_event.set()
+def main():
+   # Default voice selection
+   VOICE_NAME = VOICES[0]  # 'af' - Default voice (Bella & Sarah mix)
+   # Alternative voice selections (uncomment to test)
+   #VOICE_NAME = VOICES[1]  # 'af_bella' - Female American
+   #VOICE_NAME = VOICES[2]  # 'af_sarah' - Female American
+   #VOICE_NAME = VOICES[3]  # 'am_adam' - Male American
+   #VOICE_NAME = VOICES[4]  # 'am_michael' - Male American
+   #VOICE_NAME = VOICES[5]  # 'bf_emma' - Female British
+   #VOICE_NAME = VOICES[6]  # 'bf_isabella' - Female British
+   VOICE_NAME = VOICES[7]  # 'bm_george' - Male British
+   # VOICE_NAME = VOICES[8]  # 'bm_lewis' - Male British
+   #VOICE_NAME = VOICES[9]  # 'af_nicole' - Female American
+   #VOICE_NAME = VOICES[10] # 'af_sky' - Female American
+   processor = KokoroProcessor()
+   if not processor.setup_kokoro(VOICE_NAME):
+       return
+   # test_text = "How could I know? It's an unanswerable question. Like asking an unborn child if they'll lead a good life. They haven't even been born."
+   # test_text = "This 2022 Edition of Georgia Juvenile Practice and Procedure is a complete guide to handling cases in the juvenile courts of Georgia. This handy, yet thorough, manual incorporates the revised Juvenile Code and makes all Georgia statutes and major cases regarding juvenile proceedings quickly accessible. Since last year's edition, new material has been added and/or existing material updated on the following subjects, among others:"
+   # test_text = "See Ga. Code § 3925 (1863), now O.C.G.A. § 9-14-2; Ga. Code § 1744 (1863), now O.C.G.A. § 19-7-1; Ga. Code § 1745 (1863), now O.C.G.A. § 19-9-2; Ga. Code § 1746 (1863), now O.C.G.A. § 19-7-4; and Ga. Code § 3024 (1863), now O.C.G.A. § 19-7-4. For a full discussion of these provisions, see 27 Emory L. J. 195, 225–230, 232–233, 236–238 (1978). Note, however, that the journal article refers to the section numbers of the Code of 1910."
+   # test_text = "It is impossible to understand modern juvenile procedure law without an appreciation of some fundamentals of historical development. The beginning point for study is around the beginning of the seventeenth century, when the pater patriae concept first appeared in English jurisprudence. As "father of the country," the Crown undertook the duty of caring for those citizens who were unable to care for themselves—lunatics, idiots, and, ultimately, infants. This concept, which evolved into the parens patriae doctrine, presupposed the Crown's power to intervene in the parent-child relationship in custody disputes in order to protect the child's welfare1 and, ultimately, to deflect a delinquent child from a life of crime. The earliest statutes premised upon the parens patriae doctrine concerned child custody matters. In 1863, when the first comprehensive Code of Georgia was enacted, two courts exercised some jurisdiction over questions of child custody: the superior court and the court of the ordinary (now probate court). In essence, the draftsmen of the Code simply compiled what was then the law as a result of judicial decisions and statutes. The Code of 1863 contained five provisions concerning the parentchild relationship: Two concerned the jurisdiction of the superior court and courts of ordinary in habeas corpus and forfeiture of parental rights actions, and the remaining three concerned the guardianship jurisdiction of the court of the ordinary"
+   # test_text = "You are a helpful British butler who clearly and directly answers questions in a succinct fashion based on contexts provided to you. If you cannot find the answer within the contexts simply tell me that the contexts do not provide an answer. However, if the contexts partially address a question you answer based on what the contexts say and then briefly summarize the parts of the question that the contexts didn't provide an answer to.  Also, you should be very respectful to the person asking the question and frequently offer traditional butler services like various fancy drinks, snacks, various butler services like shining of shoes, pressing of suites, and stuff like that. Also, if you can't answer the question at all based on the provided contexts, you should apologize profusely and beg to keep your job.  Lastly, it is essential that if there are no contexts actually provided it means that a user's question wasn't relevant and you should state that you can't answer based off of the contexts because there are none.  And it goes without saying you should refuse to answer any questions that are not directly answerable by the provided contexts.  Moreover, some of the contexts might not have relevant information and you shoud simply ignore them and focus on only answering a user's question.  I cannot emphasize enought that you must gear your answer towards using this program and based your response off of the contexts you receive."
+   test_text = "According to OCGA § 15-11-145(a), the preliminary protective hearing must be held promptly and not later than 72 hours after the child is placed in foster care. However, if the 72-hour time frame expires on a weekend or legal holiday, the hearing should be held on the next business day that is not a weekend or holiday."
+   processor.process_and_play(test_text)
+if __name__ == "__main__":
+   main()
+```
+</details>
+<br>
+# Below is the original model card.
 <details><summary>ORIGINAL MODEL CARD</summary>