inference code?

#2
by TimesLast - opened

could someone provide inference code for this gguf?

See the github repo here

If you compile tts.cpp you can run the model with the compiled cli or the server. E.G. Run the following in the repository's directory to compile:

cmake -B build                                           
cmake --build build --config Release

download a model file and you can generate speech like so from the same directory

build/bin/cli --model-path /model/path/to/downloaded_gguf_file.gguf --prompt "I am saying some words" --save-path /tmp/test.wav

I'll update the README here with more information, but please see the TTS.cpp github repository for more information.

Does this supports GPU? The quantisized version is even slower than the original model?

Currently this does not support GPU or Metal acceleration (there is active work to support it though). The quantized models should not perform worse than 32bit versions of the model on TTS.cpp, and, on the same hardware, TTS.cpp should outperform torch based models without acceleration, PP, or TP protocols. If you notice otherwise (I'll do a basic performance comparison later today), feel free to raise an issue on the github repository.

Sign up or log in to comment