Respair/Tsukasa_Speech · ValueError when attempting to train

Jan 30

Getting the following error when attempting to train with accelerate_train_second.py. pretrained_model of my yml file points to the Top_ckpt_24khz.pth provided.

ValueError: The checkpoint you are trying to load has model type whisper_encoder but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

Transformers is version 4.48.1

Traceback (most recent call last):
  File "C:\Users\Matt\miniconda3\envs\tsukasa-env\Lib\site-packages\transformers\models\auto\configuration_auto.py", line 1071, in from_pretrained
    config_class = CONFIG_MAPPING[config_dict["model_type"]]
                   ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Matt\miniconda3\envs\tsukasa-env\Lib\site-packages\transformers\models\auto\configuration_auto.py", line 773, in __getitem__
    raise KeyError(key)
KeyError: 'whisper_encoder'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "H:\Github\Tsukasa_Speech\accelerate_train_second.py", line 1000, in <module>
    main()
  File "C:\Users\Matt\miniconda3\envs\tsukasa-env\Lib\site-packages\click\core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Matt\miniconda3\envs\tsukasa-env\Lib\site-packages\click\core.py", line 1082, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "C:\Users\Matt\miniconda3\envs\tsukasa-env\Lib\site-packages\click\core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Matt\miniconda3\envs\tsukasa-env\Lib\site-packages\click\core.py", line 788, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\Github\Tsukasa_Speech\accelerate_train_second.py", line 248, in main
    wl = WavLMLoss(model_params.slm.model,
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\Github\Tsukasa_Speech\losses.py", line 219, in __init__
    self.wavlm = AutoModel.from_pretrained(model)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Matt\miniconda3\envs\tsukasa-env\Lib\site-packages\transformers\models\auto\auto_factory.py", line 526, in from_pretrained
    config, kwargs = AutoConfig.from_pretrained(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Matt\miniconda3\envs\tsukasa-env\Lib\site-packages\transformers\models\auto\configuration_auto.py", line 1073, in from_pretrained
    raise ValueError(
ValueError: The checkpoint you are trying to load has model type `whisper_encoder` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

You can update Transformers with the command `pip install --upgrade transformers`. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command `pip install git+https://github.com/huggingface/transformers.git`

Respair

Owner Jan 30

Greetings.

I'm not sure, maybe the newer versions of Transformers has changed some things. also, you shouldn't use the second stage's training script to fine-tune the model as it will only update a certain part of the parameters, it won't update the decoder.

Respair

Owner Feb 1

I have pushed some changes. it should work now.

Respair changed discussion status to closed Feb 4