Metrics

TIME: 2025-05-06 14:34:50 -- STEP: 321/322 -- GLOBAL_STEP: 64200
 | > loss: -1.139192819595337  (-1.1461634561651597)
 | > log_mle: -1.1896342039108276  (-1.1868862626708554)
 | > loss_dur: 0.05044136941432953  (0.040722804967998696)
 | > amp_scaler: 1024.0  (1024.0)
 | > grad_norm: tensor(93.2843, device='cuda:0')  (tensor(114.0992, device='cuda:0'))
 | > current_lr: 4.95e-05 
 | > step_time: 1.5738  (0.7975553745792661)
 | > loader_time: 0.0148  (0.18739397280684145)

EVALUATION

warning: audio amplitude out of range, auto clipped.

--> EVAL PERFORMANCE | > avg_loader_time: 0.6140463147844587 (-0.0020079135894774947) | > avg_loss: -1.1667222261428833 (-0.044788708005632616) | > avg_log_mle: -1.2084908519472395 (-0.04398435865129735) | > avg_loss_dur: 0.04176862952964647 (-0.0008043380720274759) -->

Usage

Install coqui-tts

pip install coqui-tts

Using Python script:

from TTS.api import TTS
import torch

# Define paths and input
check_point_folder = "./ckpts"
model_path = f"{check_point_folder}/best_model.pth"
config_path = f"{check_point_folder}/config.json"

out_path = "tts_output.wav"
text = ("Trong khi đó, tại bến tàu du lịch Nha Trang, hàng ngàn du khách chen nhau "
        "để đi đến các đảo trên vịnh Nha Trang, lực lượng cảnh sát đường thủy đã "
        "tăng cường quân số để quản lý, đảm bảo an toàn cho du khách.")

# Set device (GPU if available, else CPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Initialize TTS model
tts = TTS(model_path=model_path, config_path=config_path, progress_bar=True)

# Move model to the specified device
tts.to(device)

# Perform inference and save to file
tts.tts_to_file(text=text, file_path=out_path, speaker=None, language=None, split_sentences=False)

Playing the Audio File

import IPython
IPython.display.Audio("tts_output.wav")

This is a demo output audio:

Downloads last month
20
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train danhtran2mind/Viet-Glow-TTS-finetuning

Space using danhtran2mind/Viet-Glow-TTS-finetuning 1

Collection including danhtran2mind/Viet-Glow-TTS-finetuning