Text Generation
Transformers
Safetensors
English
falcon_mamba
conversational
Inference Endpoints

There are doubts about the settings of eos_token, bos_token, pad_token

#7
by cl-modelcloud - opened

The three token settings in the tokenizer_config.json file are as follows,

"eos_token": "<|end_of_text|>",
"bos_token": "<|begin_of_text|>",
"pad_token": "<|end_of_text|>",

but in the config.json file,

"bos_token_id": 0,
"eos_token_id": 11,
"pad_token_id": 0,

These three token_ids correspond to
"bos_token_id": ">>TITLE<<",
"eos_token_id": "<|end_of_text|>",
"pad_token_id": ">>TITLE<<",

Which setting is correct?

Hi,
Thanks for spotting this ambiguity
It has been corrected now

Gkunsch changed discussion status to closed
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment