Model Card for Model ID
Model Details
Model Description
This is the model card of a 🤗 transformers model that has been pushed on the Hub.
- License: MIT
Model Sources
How to Get Started with the Model
Use the code below to get started with the model.
git clone https://github.com/ryota-komatsu/speech_resynth.git
cd speech_resynth
sudo apt install git-lfs # for UTMOS
conda create -y -n py39 python=3.9.21 pip=24.0
conda activate py39
pip install -r requirements/requirements.txt
sh scripts/setup.sh # download textlesslib and UTMOS
cd src/textlesslib
pip install -e .
cd -
import torchaudio
from textless.data.speech_encoder import SpeechEncoder
from src.flow_matching.models import ConditionalFlowMatchingWithBigVGan
wav_path = "/path/to/wav"
encoder = SpeechEncoder.by_name(
dense_model_name="mhubert-base-vp_mls_cv_8lang",
quantizer_model_name="kmeans-expresso",
vocab_size=2000,
deduplicate=False,
need_f0=False,
).cuda()
# download a pretrained model from hugging face hub
decoder = ConditionalFlowMatchingWithBigVGan.from_pretrained("ryota-komatsu/flow_matching_with_bigvgan").cuda()
# load a waveform
waveform, sr = torchaudio.load(wav_path)
waveform = torchaudio.functional.resample(waveform, sr, 16000)
# encode a waveform into pseudo-phonetic units
units = encoder(waveform.cuda())["units"]
units = units.unsqueeze(0) + 1 # 0: pad
# resynthesis
audio_values = decoder(units)
Training Data
- Downloads last month
- 195
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support