Missing Checkpoint Files

#5
by hcoxec - opened

Following from another discussion, quite a lot of the intermediate checkpoints are incomplete and so unusable. There are 3 types of errors that appear over and over again. For reference, the same code that threw the below errors, loads the other 32B checkpoints without issue. But approximately 25% of the intermediate checkpoints are unusable.

Most common, part of the weights are missing, to give just four examples - I'm sure there are more given my difficulty using other checkpoints:

In some cases, the tokenizer does not appear to be there, for example checkpoint "stage1-step170000-tokens1427B" throws this error:

  • File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 2276, in _from_pretrained - tokenizer = cls(*init_inputs, **init_kwargs)
  • File "/usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/tokenization_gpt2.py", line 159, in init
  • with open(merges_file, encoding="utf-8") as merges_handle:
  • TypeError: expected str, bytes or os.PathLike object, not NoneType

In other cases the tokenizer appears to be in some corrupted slow tokenizer format that fails to convert:

-ValueError: Converting from Tiktoken failed, if a converter for SentencePiece is available, provide a model path with a SentencePiece tokenizer.model file.Currently available slow->fast convertors: ['AlbertTokenizer', 'BartTokenizer', 'BarthezTokenizer', 'BertTokenizer', 'BigBirdTokenizer', 'BlenderbotTokenizer', 'CamembertTokenizer', 'CLIPTokenizer', 'CodeGenTokenizer', 'ConvBertTokenizer', 'DebertaTokenizer', 'DebertaV2Tokenizer', 'DistilBertTokenizer', 'DPRReaderTokenizer', 'DPRQuestionEncoderTokenizer', 'DPRContextEncoderTokenizer', 'ElectraTokenizer', 'FNetTokenizer', 'FunnelTokenizer', 'GPT2Tokenizer', 'HerbertTokenizer', 'LayoutLMTokenizer', 'LayoutLMv2Tokenizer', 'LayoutLMv3Tokenizer', 'LayoutXLMTokenizer', 'LongformerTokenizer', 'LEDTokenizer', 'LxmertTokenizer', 'MarkupLMTokenizer', 'MBartTokenizer', 'MBart50Tokenizer', 'MPNetTokenizer', 'MobileBertTokenizer', 'MvpTokenizer', 'NllbTokenizer', 'OpenAIGPTTokenizer', 'PegasusTokenizer', 'Qwen2Tokenizer', 'RealmTokenizer', 'ReformerTokenizer', 'RemBertTokenizer', 'RetriBertTokenizer', 'RobertaTokenizer', 'RoFormerTokenizer', 'SeamlessM4TTokenizer', 'SqueezeBertTokenizer', 'T5Tokenizer', 'UdopTokenizer', 'WhisperTokenizer', 'XLMRobertaTokenizer', 'XLNetTokenizer', 'SplinterTokenizer', 'XGLMTokenizer', 'LlamaTokenizer', 'CodeLlamaTokenizer', 'GemmaTokenizer', 'Phi3Tokenizer']

I would appreciate it immensely if the missing checkpoints could be made available in usable format. Thanks so much!

On further review , I think I underestimated- it's looking like almost 50% of checkpoints are unusable with the missing weights error being the most common.

Hey @hcoxec , thank you for reaching out. We have noticed this and I started re uploading the checkpoints. Estimated timeline for this process to finish is Tuesday. I will update you once it is done.

Thank you! We were hoping to include some analysis of these for a deadline this week - so please let me know as soon as they're up so we can assess if that's still feasible.

Hey @hcoxec , if you have any particular checkpoints for analysis, I can fast forward them, they'll be up by tomorrow morning.

Unfortunately we're looking at the whole time series - hence why we came across errors w/ so many different checkpoints. I appreciate the offer but we really need all of them to be able to run this!

Separately - are there any plans to release earlier checkpoints of the 13b? Currently it's checkpointing doesn't match the other models (1b, 7b, 32b) which means we can't include it in comparisons.

No, there are no plans to release earlier checkpoints of 13B.

Hey, all the checkpoints are fixed.

amanrangapur changed discussion status to closed

Sign up or log in to comment