Transformers
TensorBoard
Safetensors
English

Model Card for Model ID

Model Details

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub.

  • License: MIT

Model Sources

How to Get Started with the Model

Use the code below to get started with the model.

git clone https://github.com/ryota-komatsu/speech_resynth.git
cd speech_resynth

sudo apt install git-lfs  # for UTMOS

conda create -y -n py39 python=3.9.21 pip=24.0
conda activate py39
pip install -r requirements/requirements.txt

sh scripts/setup.sh  # download textlesslib and UTMOS

cd src/textlesslib
pip install -e .
cd -
import torchaudio

from src.bigvgan.bigvgan import BigVGan
from src.bigvgan.data import mel_spectrogram

wav_path = "/path/to/wav"

model = BigVGan.from_pretrained("ryota-komatsu/bigvgan").cuda()

# load a waveform
waveform, sr = torchaudio.load(wav_path)
waveform = torchaudio.functional.resample(waveform, sr, 16000)
waveform = waveform.cuda()

spectrogram = mel_spectrogram(waveform)
spectrogram = spectrogram.transpose(1, 2)

audio_values = model(spectrogram)

Training Details

Training Data

16 kHz-downsampled LibriTTS-R train set

Training Hyperparameters

  • Training regime: bf16 mixed precision

Model Architecture

BigVGAN-v2

Downloads last month
158
Safetensors
Model size
13M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ryota-komatsu/bigvgan

Finetunes
1 model

Dataset used to train ryota-komatsu/bigvgan