Q4 quants
Hi, guys, can you, please also upload a Q4 quants? It seems model cannot be converted to gguf using latest llama.cpp (I'm getting NotImplementedError: BPE pre-tokenizer was not recognized - update get_vocab_base_pre()
error while running ./convert_hf_to_gguf.py --outfile ../mellum-non-quant.gguf --verbose ../Mellum-4b-sft-python/
Use llama-quantize
llama-quantize --allow-requantize Mellum-4B-SFT-Python.Q8_0.gguf Mellum-4B-SFT-Python.Q4_0.gguf Q4_0
Or you can use convert_hf_to_gguf_update.py
, as a warning says. To do this, go to convert_hf_to_gguf_update.py
and replace the dict models
with
models = [
{"name": "gpt-2", "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/JetBrains/Mellum-4b-sft-python", },
]
Don't forget to remove a folder models/tokenizers/gpt-2
from the llama.cpp repo if exists.
Run ./convert_hf_to_gguf_update.py <YOUR_HF_TOKEN>
, it should update the convert_hf_to_gguf.py
.
Then run ./convert_hf_to_gguf.py --outfile ../mellum-non-quant.gguf --verbose ../Mellum-4b-sft-python/
, should work.