Text Generation
Transformers
Safetensors
English
Chinese
llama
conversational
text-generation-inference
Inference Endpoints

How to use SentencePiece tokenizer with the repo

#4
by AndreiAksionov - opened

Hey there.
I have a question: in the repo there is only one file for a tokenizer - tokenizer.model.
As I understand, it's a file for SentencePiece tokenizer.
The problem is that there are additional tokens for the instruct variant in added_tokens.json and it might be tricky to extend an already pretrained SP tokenizer.

I know that AutoTokenizer can deal with that, but what if I want to use SP tokenizer for the task (since a file for it exists)?
Or am I digging too deep and there is an easier way to use a tokenizer for this model?

Sign up or log in to comment