ai4bharat/vits_rasa_13 · No audio output

sunnyglow

1 day ago

Hi I tried the model in my local machine. Final audio output doesn't have any audio.

AshwinSankar

AI4Bharat org 1 day ago

Would you be able to share a sample code that you're using?

sunnyglow

1 day ago

Below is the code. Interestingly same code works fine in linux machine. However in windows machine it is not. Attached audio was the output in windows machine.

import soundfile as sf
from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained("ai4bharat/vits_rasa_13", trust_remote_code=True).to("cuda")
tokenizer = AutoTokenizer.from_pretrained("ai4bharat/vits_rasa_13", trust_remote_code=True)

text = "தொண்டை நாட்டுக்கும் சோழ நாட்டுக்கும் இடையில் உள்ள திருமுனைப்பாடி நாட்டின் தென்பகுதியில், தில்லைச் சிற்றம்பலத்துக்கு மேற்கே இரண்டு காததூரத்தில், அலை கடல் போன்ற ஓர் ஏரி விரிந்து பரந்து கிடக்கிறது." # Example text in Punjabi
speaker_id = 18 # PAN_M
style_id = 3 # ALEXA

inputs = tokenizer(text=text, return_tensors="pt").to("cuda")
outputs = model(inputs['input_ids'], speaker_id=speaker_id, emotion_id=style_id)
sf.write("audio.wav", outputs.waveform.detach().cpu().squeeze(), model.config.sampling_rate)
print(outputs.waveform.shape)