File size: 1,742 Bytes
86ba417 0983f95 4c460ce 86ba417 58828de 86160c1 58828de 86160c1 58828de 86160c1 58828de 86ba417 4dd86ee 86160c1 3cc2b8a 86160c1 745d354 4dd86ee 745d354 58828de |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
---
license: mit
tags:
- codec
- audio_tokenizer
- audio_codec
---
[](https://github.com/Respaired/Higgs_Codec_Extended)
This is an on-going project. it is a modified version of Higgs-Boson audio tokenizer, you can fully train it. all scripts have been tested.
a Few notes however:
- this is not backward compatible with the original checkpoint (I think you can tweak it to be, but you have to adhere to Boson community license if you do.)
- I highly recommend you to pretrain the model without the mel and adversarial setup first. it saves you a significant amount of compute, time and speed-up your convergence. raise the batch size as much as you can before the adversarial phase.
- for the semantic teacher, I am using ```utter-project/mHuBERT-147``` which has a good multilingual support. if you want the original setup you can change it in the config.
- The loss weights and hyperparameters may not be ideal, feel free to play around with different values.
I will train a checkpoint on a larger enough dataset one of these days after figuring out a few things first. but the setup is solid.
# Training
```bash
python train_boson_mixed_precision.py --data_csv "yourdata.csv" \ # full path to your audio files, the format can be anything .mp3 .wav .ogg etc.
--config config.json --batch_size 42 \
--use_mixed_precision \
--use_discriminator
```
# Simple Inference
take a look at the notebook
# Batch inference
take a look at boson_codeit.py
Happy using / training (~~inshallah~~).
|