firstpixel/F5-TTS-pt-br · Erro ao utilizar o vocoder "bigvgan"

9 days ago

Primeiramente, obrigado pelo ótimo trabalho!

Gostaria de saber se alguém está tendo problema ao utilizar o valor "bigvgan" como parâmetro para vocoder_name na classe "AgentF5TTS" (utilizada no arquivo AgentF5TTSChunk.py, disponível no repositório).
Segue o log:
"""
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\XXX\AppData\Local\Temp\jieba.cache
Loading model cost 0.368 seconds.
Prefix dict has been built successfully.
Word segmentation module jieba initialized.

You need to follow the README to init submodule and change the BigVGAN source code.
Fetching 31 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 31/31 [00:00<?, ?it/s]
Traceback (most recent call last):
File "F:\XXX\F5-TTS-pt-br\main.py", line 23, in
agent = AgentF5TTS(
File "F:\XXX\F5-TTS-pt-br\AgentF5TTSChunk.py", line 23, in init
self.model = F5TTS(ckpt_file=ckpt_file, vocoder_name=vocoder_name, device=device)
File "F:\XXX\F5-TTS-pt-br\f5tts\lib\site-packages\f5_tts\api.py", line 53, in init
self.load_vocoder_model(vocoder_name, local_path=local_path, hf_cache_dir=hf_cache_dir)
File "F:\XXX\F5-TTS-pt-br\f5tts\lib\site-packages\f5_tts\api.py", line 59, in load_vocoder_model
self.vocoder = load_vocoder(vocoder_name, local_path is not None, local_path, self.device, hf_cache_dir)
File "F:\XXX\F5-TTS-pt-br\f5tts\lib\site-packages\f5_tts\infer\utils_infer.py", line 126, in load_vocoder
vocoder = bigvgan.BigVGAN.from_pretrained(local_path, use_cuda_kernel=False)
UnboundLocalError: local variable 'bigvgan' referenced before assignment
"""

A mensagem "You need to follow the README to init submodule and change the BigVGAN source code." indica que instruções devem ser seguidas, informadas no arquivo "README" do repositório original (https://github.com/SWivid/F5-TTS). Porém não achei nada específico sobre isso, além de utilizar o comando "git submodule update --init --recursive # (optional, if need bigvgan)" durante a instalação (após a clonagem do repositório original)

No arquivo "utils_infer.py" localizado no diretório "F:\XXX\F5-TTS-pt-br\f5tts\lib\site-packages\f5_tts\infer\utils_infer.py" uma parte do código especifica o erro:
"""
...
elif vocoder_name == "bigvgan":
try:
from third_party.BigVGAN import bigvgan
except ImportError:
print("You need to follow the README to init submodule and change the BigVGAN source code.")
if is_local:
"""download from https://huggingface.co/nvidia/bigvgan_v2_24khz_100band_256x/tree/main"""
vocoder = bigvgan.BigVGAN.from_pretrained(local_path, use_cuda_kernel=False)
else:
local_path = snapshot_download(repo_id="nvidia/bigvgan_v2_24khz_100band_256x", cache_dir=hf_cache_dir)
vocoder = bigvgan.BigVGAN.from_pretrained(local_path, use_cuda_kernel=False)

...
"""
Aparentemente, o código busca esse "vocoder" no repositório da NVIDIA. Não me parece um com a API.
Não sei se seria uma caso de erro na estrutura do diretório ou falta de algum arquivo.

Agradeço qualquer ajuda (:

firstpixel

Owner 9 days ago

•

edited 9 days ago

Pode fazer com o original só colocando os arquivos pt e safetensors na pasta do checkpoint, deve funcionar por la tb usando gradios, use audios de referência de 6 a 8 segundos.
Não testei com bigVgan, somente com o original :

Download Vocos from huggingface charactr/vocos-mel-24khz

Kaii-Lee

7 days ago

Esqueci de mencionar que com o "vocos" consegui sem problemas (:
Vou tentar com o repositório original e fazer a modificação dos arquivos .pt e .safetensors pra ver se consigo com o "bigVgan"
Valeu!