Recipes to Finetune to new Language example Hindi (Finally figured out)

#45

by pronoobie - opened 15 days ago

15 days ago

•

After days of going through the documentation I finally have way to finetune this model to a new language, I chose Hindi.
You can look at how to prepare the datasets, sentence piece tokenizers, train script and training configuration to help you get started.
Here is the notebook You can run on kaggle https://github.com/deepanshu-yadav/Hindi_GramVani_Finetune/blob/main/finetuning-parakeet-on-hindi-dataset.ipynb

The code for making manifest is present here https://github.com/deepanshu-yadav/Hindi_GramVani_Finetune/blob/main/prepare_manifest.py

The code to tokenize to different language is present here https://github.com/deepanshu-yadav/Hindi_GramVani_Finetune/blob/main/tokenize_language.py
Make sure you run the tokenization code only after preparing manifests.

I had trouble running the script provided in NeMo repository. So I modified the tokenizer script to make bpe encoding for the transcripts provided.

Do not run on google colab there is an issue I have filled here https://github.com/NVIDIA/NeMo/issues/13734

Already Done -> Maybe we could freeze the encoder and train only the decoder portion of this model.
There is much more we can do.

Disclaimer:
We need to run this on a GPU with more computing power than what is available on kaggle.
We need to run for a lot more epochs.

This was just to get people started.

pronoobie changed discussion title from Recipes to Finetune to new Language Hindi to Recipes to Finetune to new Language example Hindi (Finally figured out) 15 days ago

amgadhasan

12 days ago

@pronoobie
Thanks a lot for sharing this with us.

Do you have an idea how much VRAM it needed and what was the batch size and maximum duration of the audio samples you used?

shawnwang2k

12 days ago

Hi~thanks for sharing this information.
May I ask how is your performance after fine-tuning?

pronoobie

12 days ago

•

edited 12 days ago

@pronoobie
Thanks a lot for sharing this with us.

Do you have an idea how much VRAM it needed and what was the batch size and maximum duration of the audio samples you used?

For a batch size of 4 it needed VRAM of 11 GB so if you follow according to default configuration you may need upto 50 GB VRAM.
The average duration of the samples in this datsaset is around 10-15 seconds.
You may refer to an issue I created when fine tuning https://github.com/NVIDIA/NeMo/issues/13825 moral of the whole story is we need bigger GPU with more RAM not the one offered by kaggle and colab. I am doing more changes will keep you updated in this thread. https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2/discussions/47 and on the GitHub issue.

pronoobie

12 days ago

Hi~thanks for sharing this information.
May I ask how is your performance after fine-tuning?

Here are some issues I faced during fine tuning of this model.
https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2/discussions/47
https://github.com/NVIDIA/NeMo/issues/13825
The work is going on will post any new information as soon available.
I request anyone to try finetuning if they have an A100 GPU with them or atleast A16 to try them out if they can.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment