How to FineTune the model on New Language?

#28
by alvynabranches - opened

I have around 75k sentence pairs of English and Konkani. I want to train this model on those text sentence pairs.

Also can we do zero shot for text to speech or speech to text (English input speech and Konkani output text) by just using text to text fine-tuning?

alvynabranches changed discussion title from How to FineTune the model on New Language. to How to FineTune the model on New Language?

Text to Text finetuning is not available for Seamless m4T. You can check out ASR, S2ST, and S2TT here in their Github Repository: https://github.com/facebookresearch/seamless_communication/tree/main/src/seamless_communication/cli/m4t/finetune
Finetuning might not allow Zero-Shotting other data combinations.

I'm facing loading model issue, it is not loading properly in .chache storage?

mountpoint/seamless/src/seamless-communication$ torchrun --rdzv-backend=c10d --rdzv-endpoint=localhost:0 --nnodes=1 --nproc-per-node=2 --no-python m4t_finetune
--mode SPEECH_TO_TEXT
--train_dataset $DATASET_DIR/train_manifest.json
--eval_dataset $DATASET_DIR/validation_manifest.json
--learning_rate 1e-6
--warmup_steps 100
--max_epochs 10
--patience 5
--model_name seamlessM4T_v2_large
--save_model_to $DATASET_DIR/checkpoint.pt
[2025-05-08 06:01:39,452] torch.distributed.run: [WARNING]
[2025-05-08 06:01:39,452] torch.distributed.run: [WARNING] *****************************************
[2025-05-08 06:01:39,452] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
[2025-05-08 06:01:39,452] torch.distributed.run: [WARNING] *****************************************
2025-05-08 06:01:43,001 INFO -- seamless_communication.cli.m4t.finetune.dist_utils.25256: Rank=1 local rank=1, world_size=2, is_master=False
2025-05-08 06:01:43,156 INFO -- seamless_communication.cli.m4t.finetune.dist_utils.25255: Rank=0 local rank=0, world_size=2, is_master=True
2025-05-08 06:01:44,325 INFO -- seamless_communication.cli.m4t.finetune.dist_utils.25256: Setting cuda:1 as main device
2025-05-08 06:01:44,326 INFO -- seamless_communication.cli.m4t.finetune.dist_utils.25255: Setting cuda:0 as main device
Downloading the tokenizer of seamlessM4T_v2_large...
Using the cached tokenizer of seamlessM4T_v2_large. Set force to True to download again.
Traceback (most recent call last):
  File "/home/vishwam/anaconda3/envs/seamless/bin/m4t_finetune", line 8, in
    sys.exit(main())
  File "/home/vishwam/anaconda3/envs/seamless/lib/python3.10/site-packages/seamless_communication/cli/m4t/finetune/finetune.py", line 148, in main
    text_tokenizer = load_unity_text_tokenizer(args.model_name)
  File "/home/vishwam/anaconda3/envs/seamless/lib/python3.10/site-packages/fairseq2/models/utils/generic_loaders.py", line 353, in call
    return self._load(path, card)
  File "/home/vishwam/anaconda3/envs/seamless/lib/python3.10/site-packages/fairseq2/models/nllb/loader.py", line 88, in _load
    return NllbTokenizer(pathname, langs, default_lang)
  File "/home/vishwam/anaconda3/envs/seamless/lib/python3.10/site-packages/fairseq2/models/nllb/tokenizer.py", line 43, in init
    super().init(pathname, control_symbols)
  File "/home/vishwam/anaconda3/envs/seamless/lib/python3.10/site-packages/fairseq2/data/text/sentencepiece.py", line 142, in init
    self.model = SentencePieceModel(pathname, control_symbols)
RuntimeError: basic_filebuf::underflow error reading the file: Is a directory
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4.93M/4.93M [00:00<00:00, 19.8MB/s]
2025-05-08 06:01:46,875 INFO -- finetune.25255: Finetune Params: FinetuneParams(model_name='seamlessM4T_v2_large', save_model_path=PosixPath('/checkpoint.pt'), finetune_mode=<FinetuneMode.SPEECH_TO_TEXT: 'SPEECH_TO_TEXT'>, float_dtype=torch.float16, max_epochs=10, label_smoothing=0.2, warmup_steps=100, log_steps=10, eval_steps=50, patience=5, learning_rate=1e-06, train_batch_size=5, eval_batch_size=5, device=device(type='cuda'))
Downloading the checkpoint of seamlessM4T_v2_large...
  1%|β–Ž                                                                 | 45.8M/8.45G [00:02<05:42, 26.3MB/s][2025-05-08 06:01:49,771] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 25255 closing signal SIGTERM
/home/vishwam/anaconda3/envs/seamless/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
[2025-05-08 06:01:49,986] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 1 (pid: 25256) of binary: m4t_finetune
Traceback (most recent call last):
  File "/home/vishwam/anaconda3/envs/seamless/bin/torchrun", line 8, in
    sys.exit(main())
  File "/home/vishwam/anaconda3/envs/seamless/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 347, in wrapper
    return f(*args, **kwargs)
  File "/home/vishwam/anaconda3/envs/seamless/lib/python3.10/site-packages/torch/distributed/run.py", line 812, in main
    run(args)
  File "/home/vishwam/anaconda3/envs/seamless/lib/python3.10/site-packages/torch/distributed/run.py", line 803, in run
    elastic_launch(
  File "/home/vishwam/anaconda3/envs/seamless/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 135, in call
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/vishwam/anaconda3/envs/seamless/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

m4t_finetune FAILED

Failures:
 

Root Cause (first observed failure):
[0]:
  time      : 2025-05-08_06:01:49
  host      : gpu-frp-ds03-nc12s-v3.internal.cloudapp.net
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 25256)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Sign up or log in to comment