BOS token problem with speculative decoding

#2
by jagusztinl - opened

1.7B and 0.6B models have an extra BOS token compared to 235B and other models. This extra token:
print_info: BOS token = 11 ','
means that we are unable to use this small for speculative decoding with llama.cpp:
main: draft model special tokens must match target model to use speculation

Please correct 1.7B and 0.6B models to compatible with the larger models.

Sign up or log in to comment