Wav2Vec2-Base with audio augmentation

The base model pretrained on 16kHz sampled speech-augmented audio. The audio comes from 960h Libris dataset that is augmented as follows:

The ambient noise dataset includes MUSAN and WHAM (a total of 189 hours, including music, speech, and environmental noise). The reverb dataset is from Room RIR and BUT Speech@FIT (2650 room impulse response signals).

Model Parameters License

The model parameters are made available for non-commercial use only under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. You can find details at: https://creativecommons.org/licenses/by-nc/4.0/legalcode

Contact

[email protected]

nguyenvulebinh
/

wav2vec2-noisy

Wav2Vec2-Base with audio augmentation

Model Parameters License

Contact

Dataset used to train nguyenvulebinh/wav2vec2-noisy