Translation
Safetensors
mistral

Cannot load model due to Tokenizer issues.

#8
by Fhrozen - opened

Hello there.
Thank you for providing this excellent model.

I am currently facing some issues with the transformers==4.55.0 and mistral_commons==1.8.3

It seems that this model is not supported by mistral_commons (Needed for other models).
When I tried to load this model, I got the error:

File "/workspaces/venv/lib/python3.12/site-packages/transformers/tokenization_mistral_common.py", line 1767, in from_pretrained
    tokenizer_path = download_tokenizer_from_hf_hub(
File "/workspaces/venv/lib/python3.12/site-packages/mistral_common/tokens/tokenizers/utils.py", line 159, in download_tokenizer_from_hf_hub
ValueError: No tokenizer file found for model ID: ByteDance-Seed/Seed-X-PPO-7B

And this is because mistral-commons only supports sentencepiece tokenizers ("*.model") or tekken tokenizers

https://github.com/mistralai/mistral-common/blob/d6d380a7fdeab2456c22400bfdc81c5210a78313/src/mistral_common/tokens/tokenizers/utils.py#L74-L91

Are you planning to update the models with the files (tekken.json), or no current fix in the meantime?

Best.

Any updates or solution?

Sign up or log in to comment