tokeniser.json and vocab files not found
#4
by
ShieldHero
- opened
Tokenizer requires both vocab and tokeniser.json files. But theses files are not present in the repository. I am not able to initialise the tokeniser without these files. Can someone please lend me their aid in solving this issue?
File "run.py", line 70, in
t = AutoTokenizer.from_pretrained(model_name)
File "/miniconda/lib/python3.7/site-packages/transformers/models/auto/tokenization_auto.py", line 532, in from_pretrained
return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/miniconda/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1780, in from_pretrained
**kwargs,
File "/miniconda/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1908, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/miniconda/lib/python3.7/site-packages/transformers/models/xlm_roberta/tokenization_xlm_roberta_fast.py", line 150, in __init__
**kwargs,
File "/miniconda/lib/python3.7/site-packages/transformers/tokenization_utils_fast.py", line 118, in __init__
"Couldn't instantiate the backend tokenizer from one of: \n"
ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a `tokenizers` library serialization file,
(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.

ShieldHero
changed discussion title from
tokenised.json and vocal files not found
to tokeniser.json and vocal files not found
Fixed via a2a45f84b9f7216de3462a96dc0fa0d65d441f9f
joeddav
changed discussion status to
closed
Thank you @joeddav
ShieldHero
changed discussion title from
tokeniser.json and vocal files not found
to tokeniser.json and vocab files not found