You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

🛑 Important Note ⚠️

This model is only intended for research purposes.
Access requests must be made using an institutional, academic, or corporate email. Requests from public email providers will be denied. We appreciate your understanding.

🎙️ F5-TTS-Vietnamese-150h

A compact fine-tuned version of F5-TTS trained on 150 hours of Vietnamese speech.

🔗 For more fine-tuning experiments, visit: https://github.com/nguyenthienhy/F5-TTS-Vietnamese.

📜 License: CC-BY-NC-SA-4.0 — Non-commercial research use only.


📌 Model Details

  • Dataset: VLSP 2021, VLSP 2022, VLSP 2023, vietTTS, TeacherDinh-UEH and some speech sources from YouTube channels.
  • Total dataset durations: 150 hours
  • Data processing Technique:
    • Remove all music background from audios, using facebook demucs model: https://github.com/facebookresearch/demucs
    • Do not use audio files shorter than 1 second or longer than 30 seconds.
    • Keep the default punctuation marks unchanged.
    • Normalize to lowercase format.
  • Training Configuration:
    • Base Model: F5-TTS_Base
    • GPU: RTX 3090
    • Batch Size: 3200 frames
  • Training Progress: Stopped at 350,000 steps

🛑 Update Note

Thank you, Teacher Định from the University of Economics Ho Chi Minh City (UEH), for providing me with an additional 50-hours high-quality labeled dataset.

Him contact: https://www.facebook.com/luudinhit93

📝 Usage

To load and use the model, follow the example below:

git clone https://github.com/nguyenthienhy/F5-TTS-Vietnamese
cd F5-TTS-Vietnamese
python -m pip install -e.
f5-tts_infer-cli \
--model "F5TTS_Base" \
--ref_audio ref.wav \
--ref_text "cả hai bên hãy cố gắng hiểu cho nhau" \
--gen_text "mình muốn ra nước ngoài để tiếp xúc nhiều công ty lớn, sau đó mang những gì học được về việt nam giúp xây dựng các công trình tốt hơn" \
--speed 1.0 \
--vocoder_name vocos \
--vocab_file data/your_training_dataset/vocab.txt \
--ckpt_file ckpts/your_training_dataset/model_350000.pt \
Downloads last month
44
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using hynt/F5-TTS-Vietnamese-100h 1