Model Card for Model ID

Model Details

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub.

License: MIT

Model Sources

Repository: repo

How to Get Started with the Model

Use the code below to get started with the model.

git clone https://github.com/ryota-komatsu/speech_resynth.git
cd speech_resynth

sudo apt install git-lfs  # for UTMOS

conda create -y -n py39 python=3.9.21 pip=24.0
conda activate py39
pip install -r requirements/requirements.txt

sh scripts/setup.sh  # download textlesslib and UTMOS

cd src/textlesslib
pip install -e .
cd -

import torchaudio

from src.bigvgan.bigvgan import BigVGan
from src.bigvgan.data import mel_spectrogram

wav_path = "/path/to/wav"

model = BigVGan.from_pretrained("ryota-komatsu/bigvgan").cuda()

# load a waveform
waveform, sr = torchaudio.load(wav_path)
waveform = torchaudio.functional.resample(waveform, sr, 16000)
waveform = waveform.cuda()

spectrogram = mel_spectrogram(waveform)
spectrogram = spectrogram.transpose(1, 2)

audio_values = model(spectrogram)

Training Details

Training Data

16 kHz-downsampled LibriTTS-R train set

Training Hyperparameters

Training regime: bf16 mixed precision

Model Architecture

BigVGAN-v2

ryota-komatsu
/

bigvgan

Model Card for Model ID

Model Details

Model Description

Model Sources

How to Get Started with the Model

Training Details

Training Data

Training Hyperparameters

Model Architecture

Model tree for ryota-komatsu/bigvgan

Dataset used to train ryota-komatsu/bigvgan