Spaces:
Build error
Build error
Update README.md
Browse files
README.md
CHANGED
@@ -1,530 +1,8 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
[](https://discord.gg/63Tv3F65k6)
|
10 |
-
|
11 |
-
### Thanks to support ebook2audiobook developers!
|
12 |
-
[](https://ko-fi.com/athomasson2)
|
13 |
-
|
14 |
-
### Run locally
|
15 |
-
|
16 |
-
[](#launching-gradio-web-interface)
|
17 |
-
|
18 |
-
[](https://github.com/DrewThomasson/ebook2audiobook/actions/workflows/Docker-Build.yml) [](https://github.com/DrewThomasson/ebook2audiobook/releases/latest)
|
19 |
-
|
20 |
-
|
21 |
-
<a href="https://github.com/DrewThomasson/ebook2audiobook">
|
22 |
-
<img src="https://img.shields.io/badge/Platform-mac%20|%20linux%20|%20windows-lightgrey" alt="Platform">
|
23 |
-
</a><a href="https://hub.docker.com/r/athomasson2/ebook2audiobook">
|
24 |
-
<img alt="Docker Pull Count" src="https://img.shields.io/docker/pulls/athomasson2/ebook2audiobook.svg"/>
|
25 |
-
</a>
|
26 |
-
|
27 |
-
### Run Remotely
|
28 |
-
[](https://huggingface.co/spaces/drewThomasson/ebook2audiobook)
|
29 |
-
[](https://colab.research.google.com/github/DrewThomasson/ebook2audiobook/blob/main/Notebooks/colab_ebook2audiobook.ipynb) [](https://github.com/Rihcus/ebook2audiobookXTTS/blob/main/Notebooks/kaggle-ebook2audiobook.ipynb)
|
30 |
-
|
31 |
-
#### GUI Interface
|
32 |
-

|
33 |
-
|
34 |
-
<details>
|
35 |
-
<summary>Click to see images of Web GUI</summary>
|
36 |
-
<img width="1728" alt="GUI Screen 1" src="assets/gui_1.png">
|
37 |
-
<img width="1728" alt="GUI Screen 2" src="assets/gui_2.png">
|
38 |
-
<img width="1728" alt="GUI Screen 3" src="assets/gui_3.png">
|
39 |
-
</details>
|
40 |
-
|
41 |
-
## Demos
|
42 |
-
|
43 |
-
**New Default Voice Demo**
|
44 |
-
|
45 |
-
https://github.com/user-attachments/assets/750035dc-e355-46f1-9286-05c1d9e88cea
|
46 |
-
|
47 |
-
<details>
|
48 |
-
<summary>More Demos</summary>
|
49 |
-
|
50 |
-
**ASMR Voice**
|
51 |
-
|
52 |
-
https://github.com/user-attachments/assets/68eee9a1-6f71-4903-aacd-47397e47e422
|
53 |
-
|
54 |
-
**Rainy Day Voice**
|
55 |
-
|
56 |
-
https://github.com/user-attachments/assets/d25034d9-c77f-43a9-8f14-0d167172b080
|
57 |
-
|
58 |
-
**Scarlett Voice**
|
59 |
-
|
60 |
-
https://github.com/user-attachments/assets/b12009ee-ec0d-45ce-a1ef-b3a52b9f8693
|
61 |
-
|
62 |
-
**David Attenborough Voice**
|
63 |
-
|
64 |
-
https://github.com/user-attachments/assets/81c4baad-117e-4db5-ac86-efc2b7fea921
|
65 |
-
|
66 |
-
**Example**
|
67 |
-
|
68 |
-

|
69 |
-
</details>
|
70 |
-
|
71 |
-
## README.md
|
72 |
-
|
73 |
-
## Table of Contents
|
74 |
-
- [ebook2audiobook](#-ebook2audiobook)
|
75 |
-
- [Features](#features)
|
76 |
-
- [GUI Interface](#gui-interface)
|
77 |
-
- [Demos](#demos)
|
78 |
-
- [Supported Languages](#supported-languages)
|
79 |
-
- [Minimum Requirements](#hardware-requirements)
|
80 |
-
- [Usage](#launching-gradio-web-interface)
|
81 |
-
- [Run Locally](#launching-gradio-web-interface)
|
82 |
-
- [Launching Gradio Web Interface](#launching-gradio-web-interface)
|
83 |
-
- [Basic Headless Usage](#basic--usage)
|
84 |
-
- [Headless Custom XTTS Model Usage](#example-of-custom-model-zip-upload)
|
85 |
-
- [Help command output](#help-command-output)
|
86 |
-
- [Run Remotely](#run-remotely)
|
87 |
-
- [Fine Tuned TTS models](#fine-tuned-tts-models)
|
88 |
-
- [Collection of Fine-Tuned TTS Models](#fine-tuned-tts-collection)
|
89 |
-
- [Train XTTSv2](#fine-tune-your-own-xttsv2-model)
|
90 |
-
- [Docker](#docker-gpu-options)
|
91 |
-
- [GPU options](#docker-gpu-options)
|
92 |
-
- [Docker Run](#running-the-pre-built-docker-container)
|
93 |
-
- [Docker Build](#building-the-docker-container)
|
94 |
-
- [Docker Compose](#docker-compose)
|
95 |
-
- [Docker headless guide](#docker-headless-guide)
|
96 |
-
- [Docker container file locations](#docker-container-file-locations)
|
97 |
-
- [Common Docker issues](#common-docker-issues)
|
98 |
-
- [Supported eBook Formats](#supported-ebook-formats)
|
99 |
-
- [Output Formats](#output-formats)
|
100 |
-
- [Updating to Latest Version](#updating-to-latest-version)
|
101 |
-
- [Revert to older Version](#reverting-to-older-versions)
|
102 |
-
- [Common Issues](#common-issues)
|
103 |
-
- [Special Thanks](#special-thanks)
|
104 |
-
- [Table of Contents](#table-of-contents)
|
105 |
-
|
106 |
-
|
107 |
-
## Features
|
108 |
-
- 📚 Splits eBook into chapters for organized audio.
|
109 |
-
- 🎙️ High-quality text-to-speech with [Coqui XTTSv2](https://huggingface.co/coqui/XTTS-v2) and [Fairseq](https://github.com/facebookresearch/fairseq/tree/main/examples/mms) (and more).
|
110 |
-
- 🗣️ Optional voice cloning with your own voice file.
|
111 |
-
- 🌍 Supports +1110 languages (English by default). [List of Supported languages](https://dl.fbaipublicfiles.com/mms/tts/all-tts-languages.html)
|
112 |
-
- 🖥️ Designed to run on 4GB RAM.
|
113 |
-
|
114 |
-
|
115 |
-
## Supported Languages
|
116 |
-
| **Arabic (ar)** | **Chinese (zh)** | **English (en)** | **Spanish (es)** |
|
117 |
-
|:------------------:|:------------------:|:------------------:|:------------------:|
|
118 |
-
| **French (fr)** | **German (de)** | **Italian (it)** | **Portuguese (pt)** |
|
119 |
-
| **Polish (pl)** | **Turkish (tr)** | **Russian (ru)** | **Dutch (nl)** |
|
120 |
-
| **Czech (cs)** | **Japanese (ja)** | **Hindi (hi)** | **Bengali (bn)** |
|
121 |
-
| **Hungarian (hu)** | **Korean (ko)** | **Vietnamese (vi)**| **Swedish (sv)** |
|
122 |
-
| **Persian (fa)** | **Yoruba (yo)** | **Swahili (sw)** | **Indonesian (id)**|
|
123 |
-
| **Slovak (sk)** | **Croatian (hr)** | **Tamil (ta)** | **Danish (da)** |
|
124 |
-
- [**+1100 languages and dialects here**](https://dl.fbaipublicfiles.com/mms/tts/all-tts-languages.html)
|
125 |
-
|
126 |
-
|
127 |
-
## Hardware Requirements
|
128 |
-
- 4gb RAM minimum, 8GB recommended
|
129 |
-
- Virtualization enabled if running on windows (Docker only)
|
130 |
-
- CPU (intel, AMD, ARM), GPU (Nvidia, AMD*, Intel*) (Recommended), MPS (Apple Silicon CPU)
|
131 |
-
*available very soon
|
132 |
-
|
133 |
-
> [!IMPORTANT]
|
134 |
-
**Before to post an install or bug issue search carefully to the opened and closed issues TAB<br>
|
135 |
-
to be sure your issue does not exist already.**
|
136 |
-
|
137 |
-
|
138 |
-
>[!NOTE]
|
139 |
-
**Lacking of any standards structure like what is a chapter, paragraph, preface etc.<br>
|
140 |
-
you should first remove manually any text you don't want to be converted in audio.**
|
141 |
-
|
142 |
-
### Installation Instructions
|
143 |
-
1. **Clone repo**
|
144 |
-
```bash
|
145 |
-
git clone https://github.com/DrewThomasson/ebook2audiobook.git
|
146 |
-
cd ebook2audiobook
|
147 |
-
```
|
148 |
-
|
149 |
-
### Launching Gradio Web Interface
|
150 |
-
1. **Run ebook2audiobook**:
|
151 |
-
- **Linux/MacOS**
|
152 |
-
```bash
|
153 |
-
./ebook2audiobook.sh # Run launch script
|
154 |
-
```
|
155 |
-
|
156 |
-
- **Mac Launcher**
|
157 |
-
Double click `Mac Ebook2Audiobook Launcher.command`
|
158 |
-
|
159 |
-
|
160 |
-
- **Windows**
|
161 |
-
```bash
|
162 |
-
ebook2audiobook.cmd # Run launch script or double click on it
|
163 |
-
```
|
164 |
-
|
165 |
-
- **Windows Launcher**
|
166 |
-
Double click `ebook2audiobook.cmd`
|
167 |
-
|
168 |
-
|
169 |
-
- **Manual Python Install**
|
170 |
-
```bash
|
171 |
-
# (for experts only!)
|
172 |
-
REQUIRED_PROGRAMS=("calibre" "ffmpeg" "nodejs" "mecab" "espeak-ng" "rust" "sox")
|
173 |
-
REQUIRED_PYTHON_VERSION="3.12"
|
174 |
-
pip install -r requirements.txt # Install Python Requirements
|
175 |
-
python app.py # Run Ebook2Audiobook
|
176 |
-
```
|
177 |
-
|
178 |
-
1. **Open the Web App**: Click the URL provided in the terminal to access the web app and convert eBooks. `http://localhost:7860/`
|
179 |
-
2. **For Public Link**:
|
180 |
-
`python app.py --share` (all OS)
|
181 |
-
`./ebook2audiobook.sh --share` (Linux/MacOS)
|
182 |
-
`ebook2audiobook.cmd --share` (Windows)
|
183 |
-
|
184 |
-
> [!IMPORTANT]
|
185 |
-
**If the script is stopped and run again, you need to refresh your gradio GUI interface<br>
|
186 |
-
to let the web page reconnect to the new connection socket.**
|
187 |
-
|
188 |
-
### Basic Usage
|
189 |
-
- **Linux/MacOS**:
|
190 |
-
```bash
|
191 |
-
./ebook2audiobook.sh --headless --ebook <path_to_ebook_file> \
|
192 |
-
--voice [path_to_voice_file] --language [language_code]
|
193 |
-
```
|
194 |
-
- **Windows**
|
195 |
-
```bash
|
196 |
-
ebook2audiobook.cmd --headless --ebook <path_to_ebook_file>
|
197 |
-
--voice [path_to_voice_file] --language [language_code]
|
198 |
-
```
|
199 |
-
|
200 |
-
- **[--ebook]**: Path to your eBook file
|
201 |
-
- **[--voice]**: Voice cloning file path (optional)
|
202 |
-
- **[--language]**: Language code in ISO-639-3 (i.e.: ita for italian, eng for english, deu for german...).<br>
|
203 |
-
Default language is eng and --language is optional for default language set in ./lib/lang.py.<br>
|
204 |
-
The ISO-639-1 2 letters codes are also supported.
|
205 |
-
|
206 |
-
|
207 |
-
### Example of Custom Model Zip Upload
|
208 |
-
(must be a .zip file containing the mandatory model files. Example for XTTSv2: config.json, model.pth, vocab.json and ref.wav)
|
209 |
-
- **Linux/MacOS**
|
210 |
-
```bash
|
211 |
-
./ebook2audiobook.sh --headless --ebook <ebook_file_path> \
|
212 |
-
--voice <target_voice_file_path> --language <language> --custom_model <custom_model_path>
|
213 |
-
```
|
214 |
-
- **Windows**
|
215 |
-
```bash
|
216 |
-
ebook2audiobook.cmd --headless --ebook <ebook_file_path> \
|
217 |
-
--voice <target_voice_file_path> --language <language> --custom_model <custom_model_path>
|
218 |
-
```
|
219 |
-
- **<custom_model_path>**: Path to `model_name.zip` file,
|
220 |
-
which must contain (according to the tts engine) all the mandatory files<br>
|
221 |
-
(see ./lib/models.py).
|
222 |
-
|
223 |
-
|
224 |
-
### For Detailed Guide with list of all Parameters to use
|
225 |
-
- **Linux/MacOS**
|
226 |
-
```bash
|
227 |
-
./ebook2audiobook.sh --help
|
228 |
-
```
|
229 |
-
- **Windows**
|
230 |
-
```bash
|
231 |
-
ebook2audiobook.cmd --help
|
232 |
-
```
|
233 |
-
- **Or for all OS**
|
234 |
-
```python
|
235 |
-
app.py --help
|
236 |
-
```
|
237 |
-
|
238 |
-
<a id="help-command-output"></a>
|
239 |
-
```bash
|
240 |
-
usage: app.py [-h] [--session SESSION] [--share] [--headless] [--ebook EBOOK]
|
241 |
-
[--ebooks_dir EBOOKS_DIR] [--language LANGUAGE] [--voice VOICE]
|
242 |
-
[--device {cpu,gpu,mps}]
|
243 |
-
[--tts_engine {XTTSv2,BARK,VITS,FAIRSEQ,TACOTRON2,YOURTTS,xtts,bark,vits,fairseq,tacotron,yourtts}]
|
244 |
-
[--custom_model CUSTOM_MODEL] [--fine_tuned FINE_TUNED]
|
245 |
-
[--output_format OUTPUT_FORMAT] [--temperature TEMPERATURE]
|
246 |
-
[--length_penalty LENGTH_PENALTY] [--num_beams NUM_BEAMS]
|
247 |
-
[--repetition_penalty REPETITION_PENALTY] [--top_k TOP_K]
|
248 |
-
[--top_p TOP_P] [--speed SPEED] [--enable_text_splitting]
|
249 |
-
[--text_temp TEXT_TEMP] [--waveform_temp WAVEFORM_TEMP]
|
250 |
-
[--output_dir OUTPUT_DIR] [--version]
|
251 |
-
|
252 |
-
Convert eBooks to Audiobooks using a Text-to-Speech model. You can either launch the Gradio interface or run the script in headless mode for direct conversion.
|
253 |
-
|
254 |
-
options:
|
255 |
-
-h, --help show this help message and exit
|
256 |
-
--session SESSION Session to resume the conversion in case of interruption, crash,
|
257 |
-
or reuse of custom models and custom cloning voices.
|
258 |
-
|
259 |
-
**** The following options are for all modes:
|
260 |
-
Optional
|
261 |
-
|
262 |
-
**** The following option are for gradio/gui mode only:
|
263 |
-
Optional
|
264 |
-
|
265 |
-
--share Enable a public shareable Gradio link.
|
266 |
-
|
267 |
-
**** The following options are for --headless mode only:
|
268 |
-
--headless Run the script in headless mode
|
269 |
-
--ebook EBOOK Path to the ebook file for conversion. Cannot be used when --ebooks_dir is present.
|
270 |
-
--ebooks_dir EBOOKS_DIR
|
271 |
-
Relative or absolute path of the directory containing the files to convert.
|
272 |
-
Cannot be used when --ebook is present.
|
273 |
-
--language LANGUAGE Language of the e-book. Default language is set
|
274 |
-
in ./lib/lang.py sed as default if not present. All compatible language codes are in ./lib/lang.py
|
275 |
-
|
276 |
-
optional parameters:
|
277 |
-
--voice VOICE (Optional) Path to the voice cloning file for TTS engine.
|
278 |
-
Uses the default voice if not present.
|
279 |
-
--device {cpu,gpu,mps}
|
280 |
-
(Optional) Pprocessor unit type for the conversion.
|
281 |
-
Default is set in ./lib/conf.py if not present. Fall back to CPU if GPU not available.
|
282 |
-
--tts_engine {XTTSv2,BARK,VITS,FAIRSEQ,TACOTRON2,YOURTTS,xtts,bark,vits,fairseq,tacotron,yourtts}
|
283 |
-
(Optional) Preferred TTS engine (available are: ['XTTSv2', 'BARK', 'VITS', 'FAIRSEQ', 'TACOTRON2', 'YOURTTS', 'xtts', 'bark', 'vits', 'fairseq', 'tacotron', 'yourtts'].
|
284 |
-
Default depends on the selected language. The tts engine should be compatible with the chosen language
|
285 |
-
--custom_model CUSTOM_MODEL
|
286 |
-
(Optional) Path to the custom model zip file cntaining mandatory model files.
|
287 |
-
Please refer to ./lib/models.py
|
288 |
-
--fine_tuned FINE_TUNED
|
289 |
-
(Optional) Fine tuned model path. Default is builtin model.
|
290 |
-
--output_format OUTPUT_FORMAT
|
291 |
-
(Optional) Output audio format. Default is set in ./lib/conf.py
|
292 |
-
--temperature TEMPERATURE
|
293 |
-
(xtts only, optional) Temperature for the model.
|
294 |
-
Default to config.json model. Higher temperatures lead to more creative outputs.
|
295 |
-
--length_penalty LENGTH_PENALTY
|
296 |
-
(xtts only, optional) A length penalty applied to the autoregressive decoder.
|
297 |
-
Default to config.json model. Not applied to custom models.
|
298 |
-
--num_beams NUM_BEAMS
|
299 |
-
(xtts only, optional) Controls how many alternative sequences the model explores. Must be equal or greater than length penalty.
|
300 |
-
Default to config.json model.
|
301 |
-
--repetition_penalty REPETITION_PENALTY
|
302 |
-
(xtts only, optional) A penalty that prevents the autoregressive decoder from repeating itself.
|
303 |
-
Default to config.json model.
|
304 |
-
--top_k TOP_K (xtts only, optional) Top-k sampling.
|
305 |
-
Lower values mean more likely outputs and increased audio generation speed.
|
306 |
-
Default to config.json model.
|
307 |
-
--top_p TOP_P (xtts only, optional) Top-p sampling.
|
308 |
-
Lower values mean more likely outputs and increased audio generation speed. Default to config.json model.
|
309 |
-
--speed SPEED (xtts only, optional) Speed factor for the speech generation.
|
310 |
-
Default to config.json model.
|
311 |
-
--enable_text_splitting
|
312 |
-
(xtts only, optional) Enable TTS text splitting. This option is known to not be very efficient.
|
313 |
-
Default to config.json model.
|
314 |
-
--text_temp TEXT_TEMP
|
315 |
-
(bark only, optional) Text Temperature for the model.
|
316 |
-
Default to 0.85. Higher temperatures lead to more creative outputs.
|
317 |
-
--waveform_temp WAVEFORM_TEMP
|
318 |
-
(bark only, optional) Waveform Temperature for the model.
|
319 |
-
Default to 0.5. Higher temperatures lead to more creative outputs.
|
320 |
-
--output_dir OUTPUT_DIR
|
321 |
-
(Optional) Path to the output directory. Default is set in ./lib/conf.py
|
322 |
-
--version Show the version of the script and exit
|
323 |
-
|
324 |
-
Example usage:
|
325 |
-
Windows:
|
326 |
-
Gradio/GUI:
|
327 |
-
ebook2audiobook.cmd
|
328 |
-
Headless mode:
|
329 |
-
ebook2audiobook.cmd --headless --ebook '/path/to/file'
|
330 |
-
Linux/Mac:
|
331 |
-
Gradio/GUI:
|
332 |
-
./ebook2audiobook.sh
|
333 |
-
Headless mode:
|
334 |
-
./ebook2audiobook.sh --headless --ebook '/path/to/file'
|
335 |
-
|
336 |
-
Tip: to add of silence (1.4 seconds) into your text just use "###" or "[pause]".
|
337 |
-
|
338 |
-
```
|
339 |
-
|
340 |
-
NOTE: in gradio/gui mode, to cancel a running conversion, just click on the [X] from the ebook upload component.
|
341 |
-
|
342 |
-
TIP: if it needs some more pauses, just add '###' or '[pause]' between the words you wish more pause. one [pause] equals to 1.4 seconds
|
343 |
-
|
344 |
-
#### Docker GPU Options
|
345 |
-
|
346 |
-
Available pre-build tags: `latest` (CUDA 11.8)
|
347 |
-
#### Edit: IF GPU isn't detected then you'll have to build the image -> [Building the Docker Container](#building-the-docker-container)
|
348 |
-
|
349 |
-
|
350 |
-
|
351 |
-
#### Running the pre-built Docker Container
|
352 |
-
|
353 |
-
-Run with CPU only
|
354 |
-
```powershell
|
355 |
-
docker run --pull always --rm -p 7860:7860 athomasson2/ebook2audiobook
|
356 |
-
```
|
357 |
-
-Run with GPU Speedup (NVIDIA compatible only)
|
358 |
-
```powershell
|
359 |
-
docker run --pull always --rm --gpus all -p 7860:7860 athomasson2/ebook2audiobook
|
360 |
-
```
|
361 |
-
|
362 |
-
This command will start the Gradio interface on port 7860.(localhost:7860)
|
363 |
-
- For more options add the parameter `--help`
|
364 |
-
|
365 |
-
|
366 |
-
#### Building the Docker Container
|
367 |
-
- You can build the docker image with the command:
|
368 |
-
```powershell
|
369 |
-
docker build -t athomasson2/ebook2audiobook .
|
370 |
-
```
|
371 |
-
#### Avalible Docker Build Arguments
|
372 |
-
|
373 |
-
`--build-arg TORCH_VERSION=cuda118` Available tags: [cuda121, cuda118, cuda128, rocm, xpu, cpu]
|
374 |
-
|
375 |
-
All CUDA version numbers should work, Ex: CUDA 11.6-> cuda116
|
376 |
-
|
377 |
-
`--build-arg SKIP_XTTS_TEST=true` (Saves space by not baking XTTSv2 model into docker image)
|
378 |
-
|
379 |
-
|
380 |
-
## Docker container file locations
|
381 |
-
All ebook2audiobooks will have the base dir of `/app/`
|
382 |
-
For example:
|
383 |
-
`tmp` = `/app/tmp`
|
384 |
-
`audiobooks` = `/app/audiobooks`
|
385 |
-
|
386 |
-
|
387 |
-
## Docker headless guide
|
388 |
-
|
389 |
-
- Before you do run this you need to create a dir named "input-folder" in your current dir
|
390 |
-
which will be linked, This is where you can put your input files for the docker image to see
|
391 |
-
```bash
|
392 |
-
mkdir input-folder && mkdir Audiobooks
|
393 |
-
```
|
394 |
-
- In the command below swap out **YOUR_INPUT_FILE.TXT** with the name of your input file
|
395 |
-
```bash
|
396 |
-
docker run --pull always --rm \
|
397 |
-
-v $(pwd)/input-folder:/app/input_folder \
|
398 |
-
-v $(pwd)/audiobooks:/app/audiobooks \
|
399 |
-
athomasson2/ebook2audiobook \
|
400 |
-
--headless --ebook /input_folder/YOUR_EBOOK_FILE
|
401 |
-
```
|
402 |
-
- The output Audiobooks will be found in the Audiobook folder which will also be located
|
403 |
-
in your local dir you ran this docker command in
|
404 |
-
|
405 |
-
|
406 |
-
## To get the help command for the other parameters this program has you can run this
|
407 |
-
|
408 |
-
```bash
|
409 |
-
docker run --pull always --rm athomasson2/ebook2audiobook --help
|
410 |
-
|
411 |
-
```
|
412 |
-
That will output this
|
413 |
-
[Help command output](#help-command-output)
|
414 |
-
|
415 |
-
|
416 |
-
### Docker Compose
|
417 |
-
This project uses Docker Compose to run locally. You can enable or disable GPU support
|
418 |
-
by setting either `*gpu-enabled` or `*gpu-disabled` in `docker-compose.yml`
|
419 |
-
|
420 |
-
|
421 |
-
#### Steps to Run
|
422 |
-
1. **Clone the Repository** (if you haven't already):
|
423 |
-
```bash
|
424 |
-
git clone https://github.com/DrewThomasson/ebook2audiobook.git
|
425 |
-
cd ebook2audiobook
|
426 |
-
```
|
427 |
-
2. **Set GPU Support (disabled by default)**
|
428 |
-
To enable GPU support, modify `docker-compose.yml` and change `*gpu-disabled` to `*gpu-enabled`
|
429 |
-
3. **Start the service:**
|
430 |
-
```bash
|
431 |
-
# Docker
|
432 |
-
docker-compose up -d # To update add --build
|
433 |
-
|
434 |
-
# Podman
|
435 |
-
podman compose -f podman-compose.yml up -d # To update add --build
|
436 |
-
```
|
437 |
-
4. **Access the service:**
|
438 |
-
The service will be available at http://localhost:7860.
|
439 |
-
|
440 |
-
|
441 |
-
## Common Docker Issues
|
442 |
-
|
443 |
-
- My NVIDIA GPU isnt being detected?? -> [GPU ISSUES Wiki Page](https://github.com/DrewThomasson/ebook2audiobook/wiki/GPU-ISSUES)
|
444 |
-
|
445 |
-
- `python: can't open file '/home/user/app/app.py': [Errno 2] No such file or directory` (Just remove all post arguments as I replaced the `CMD` with `ENTRYPOINT` in the [Dockerfile](Dockerfile))
|
446 |
-
- Example: `docker run --pull always athomasson2/ebook2audiobook app.py --script_mode full_docker` - > corrected - > `docker run --pull always athomasson2/ebook2audiobook`
|
447 |
-
- Arguments can be easily added like this now `docker run --pull always athomasson2/ebook2audiobook --share`
|
448 |
-
|
449 |
-
- Docker gets stuck downloading Fine-Tuned models.
|
450 |
-
(This does not happen for every computer but some appear to run into this issue)
|
451 |
-
Disabling the progress bar appears to fix the issue,
|
452 |
-
as discussed [here in #191](https://github.com/DrewThomasson/ebook2audiobook/issues/191)
|
453 |
-
Example of adding this fix in the `docker run` command
|
454 |
-
```Dockerfile
|
455 |
-
docker run --pull always --rm --gpus all -e HF_HUB_DISABLE_PROGRESS_BARS=1 -e HF_HUB_ENABLE_HF_TRANSFER=0 \
|
456 |
-
-p 7860:7860 athomasson2/ebook2audiobook
|
457 |
-
```
|
458 |
-
|
459 |
-
|
460 |
-
## Fine Tuned TTS models
|
461 |
-
#### Fine Tune your own XTTSv2 model
|
462 |
-
|
463 |
-
[](https://huggingface.co/spaces/drewThomasson/xtts-finetune-webui-gpu) [](https://github.com/DrewThomasson/ebook2audiobook/blob/v25/Notebooks/finetune/xtts/kaggle-xtts-finetune-webui-gradio-gui.ipynb) [](https://colab.research.google.com/github/DrewThomasson/ebook2audiobook/blob/v25/Notebooks/finetune/xtts/colab_xtts_finetune_webui.ipynb)
|
464 |
-
|
465 |
-
|
466 |
-
|
467 |
-
|
468 |
-
|
469 |
-
#### De-noise training data
|
470 |
-
|
471 |
-
[](https://huggingface.co/spaces/drewThomasson/DeepFilterNet2_no_limit) [](https://github.com/Rikorose/DeepFilterNet)
|
472 |
-
|
473 |
-
|
474 |
-
### Fine Tuned TTS Collection
|
475 |
-
|
476 |
-
[](https://huggingface.co/drewThomasson/fineTunedTTSModels/tree/main)
|
477 |
-
|
478 |
-
For an XTTSv2 custom model a ref audio clip of the voice reference is mandatory:
|
479 |
-
|
480 |
-
|
481 |
-
## Supported eBook Formats
|
482 |
-
- `.epub`, `.pdf`, `.mobi`, `.txt`, `.html`, `.rtf`, `.chm`, `.lit`,
|
483 |
-
`.pdb`, `.fb2`, `.odt`, `.cbr`, `.cbz`, `.prc`, `.lrf`, `.pml`,
|
484 |
-
`.snb`, `.cbc`, `.rb`, `.tcr`
|
485 |
-
- **Best results**: `.epub` or `.mobi` for automatic chapter detection
|
486 |
-
|
487 |
-
|
488 |
-
## Output Formats
|
489 |
-
- Creates a `['m4b', 'm4a', 'mp4', 'webm', 'mov', 'mp3', 'flac', 'wav', 'ogg', 'aac']` (set in ./lib/conf.py) file with metadata and chapters.
|
490 |
-
|
491 |
-
## Updating to Latest Version
|
492 |
-
```bash
|
493 |
-
git pull # Locally/Compose
|
494 |
-
|
495 |
-
docker pull athomasson2/ebook2audiobook:latest # For Pre-build docker images
|
496 |
-
```
|
497 |
-
|
498 |
-
## Reverting to older Versions
|
499 |
-
Releases can be found -> [here](https://github.com/DrewThomasson/ebook2audiobook/releases)
|
500 |
-
```bash
|
501 |
-
git checkout tags/VERSION_NUM # Locally/Compose -> Example: git checkout tags/v25.7.7
|
502 |
-
|
503 |
-
athomasson2/ebook2audiobook:VERSION_NUM # For Pre-build docker images -> Example: athomasson2/ebook2audiobook:v25.7.7
|
504 |
-
```
|
505 |
-
|
506 |
-
## Common Issues:
|
507 |
-
- My NVIDIA GPU isnt being detected?? -> [GPU ISSUES Wiki Page](https://github.com/DrewThomasson/ebook2audiobook/wiki/GPU-ISSUES)
|
508 |
-
- CPU is slow (better on server smp CPU) while NVIDIA GPU can have almost real time conversion.
|
509 |
-
[Discussion about this](https://github.com/DrewThomasson/ebook2audiobook/discussions/19#discussioncomment-10879846)
|
510 |
-
For faster multilingual generation I would suggest my other
|
511 |
-
[project that uses piper-tts](https://github.com/DrewThomasson/ebook2audiobookpiper-tts) instead
|
512 |
-
(It doesn't have zero-shot voice cloning though, and is Siri quality voices, but it is much faster on cpu).
|
513 |
-
- "I'm having dependency issues" - Just use the docker, its fully self contained and has a headless mode,
|
514 |
-
add `--help` parameter at the end of the docker run command for more information.
|
515 |
-
- "Im getting a truncated audio issue!" - PLEASE MAKE AN ISSUE OF THIS,
|
516 |
-
we don't speak every language and need advise from users to fine tune the sentence splitting logic.😊
|
517 |
-
|
518 |
-
|
519 |
-
## What we need help with! 🙌
|
520 |
-
## [Full list of things can be found here](https://github.com/DrewThomasson/ebook2audiobook/issues/32)
|
521 |
-
- Any help from people speaking any of the supported languages to help us improve the models
|
522 |
-
|
523 |
-
## Do you need to rent a GPU to boost service from us?
|
524 |
-
- A poll is open here https://github.com/DrewThomasson/ebook2audiobook/discussions/889
|
525 |
-
|
526 |
-
## Special Thanks
|
527 |
-
- **Coqui TTS**: [Coqui TTS GitHub](https://github.com/idiap/coqui-ai-TTS)
|
528 |
-
- **Calibre**: [Calibre Website](https://calibre-ebook.com)
|
529 |
-
- **FFmpeg**: [FFmpeg Website](https://ffmpeg.org)
|
530 |
-
- [@shakenbake15 for better chapter saving method](https://github.com/DrewThomasson/ebook2audiobook/issues/8)
|
|
|
1 |
+
---
|
2 |
+
title: ebook2audiobook (Docker)
|
3 |
+
emoji: 🎧
|
4 |
+
colorFrom: blue
|
5 |
+
colorTo: green
|
6 |
+
sdk: docker
|
7 |
+
pinned: false
|
8 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|