File size: 1,742 Bytes
86ba417
 
 
 
 
 
 
0983f95
4c460ce
86ba417
 
 
58828de
86160c1
58828de
86160c1
58828de
86160c1
58828de
 
86ba417
 
 
4dd86ee
 
86160c1
3cc2b8a
86160c1
 
 
 
 
745d354
4dd86ee
 
 
745d354
 
 
58828de
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
---
license: mit
tags:
- codec
- audio_tokenizer
- audio_codec
---

[![GitHub](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)](https://github.com/Respaired/Higgs_Codec_Extended)

This is an on-going project. it is a modified version of Higgs-Boson audio tokenizer, you can fully train it. all scripts have been tested. 
a Few notes however:

  - this is not backward compatible with the original checkpoint (I think you can tweak it to be, but you have to adhere to Boson community license if you do.)
    
  - I highly recommend you to pretrain the model without the mel and adversarial setup first. it saves you a significant amount of compute, time and speed-up your convergence. raise the batch size as much as you can before the adversarial phase.
    
  - for the semantic teacher, I am using ```utter-project/mHuBERT-147``` which has a good multilingual support. if you want the original setup you can change it in the config.
    
  - The loss weights and hyperparameters may not be ideal, feel free to play around with different values.

I will train a checkpoint on a larger enough dataset one of these days after figuring out a few things first. but the setup is solid.

# Training

```bash
python train_boson_mixed_precision.py --data_csv "yourdata.csv" \ # full path to your audio files, the format can be anything .mp3 .wav .ogg etc.
                                      --config config.json --batch_size 42  \
                                      --use_mixed_precision \
                                      --use_discriminator
```

# Simple Inference

take a look at the notebook

# Batch inference
take a look at boson_codeit.py

Happy using / training (~~inshallah~~).