BOS token problem with speculative decoding

by jagusztinl - opened May 13

May 13

1.7B and 0.6B models have an extra BOS token compared to 235B and other models. This extra token:
print_info: BOS token = 11 ','
means that we are unable to use this small for speculative decoding with llama.cpp:
main: draft model special tokens must match target model to use speculation

Please correct 1.7B and 0.6B models to compatible with the larger models.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment