There are doubts about the settings of eos_token, bos_token, pad_token

by cl-modelcloud - opened Aug 15, 2024

Aug 15, 2024

The three token settings in the tokenizer_config.json file are as follows,

but in the config.json file,

"bos_token_id": 0,
"eos_token_id": 11,
"pad_token_id": 0,

These three token_ids correspond to
"bos_token_id": ">>TITLE<<",
"eos_token_id": "<|end_of_text|>",
"pad_token_id": ">>TITLE<<",

Which setting is correct?

Aug 15, 2024

Hi,
Thanks for spotting this ambiguity
It has been corrected now

Gkunsch changed discussion status to closed Aug 15, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment