Conversion process

#1
by AlfredWALLACE - opened

Thanks for the quantized model which allows us to test this great AI.
Would you share your conversion method as I was not able to do it myself with llama.cpp scripts and would like to quantize more versions ?

Sure @AlfredWALLACE
You have to download and compile llama.cpp from github

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make LLAMA_CUBLAS=1 

then you need to create python env and install requirements of llama.cpp

 pip install -r requirements.txt

then run the convert script to make the f16 format

python ~/dev/llama.cpp/convert.py ./Magicoder-S-CL-7B --outtype f16

then run the compiled app quantize which will be generated after compiling llama.cpp

quantize ./Magicoder-S-CL-7B/ggml-model-f16.gguf q5_k_m

Good Luck.

Thanks! I had no luck with loading the model quantized with the same commands, previous to my post, but with a S-DS model.

Thanks! I had no luck with loading the model quantized with the same commands, previous to my post, but with a S-DS model.

try this fork , will work for sure
https://github.com/akhil3417/llama.cpp

Thanks! I had no luck with loading the model quantized with the same commands, previous to my post, but with a S-DS model.

try this fork , will work for sure
https://github.com/akhil3417/llama.cpp

Could you please explain what is the changes or features in your fork ?

merged '417884e regex_gpt2_preprocess pr'

Thanks! I'll try! in the mean time, the GPTQ version works really well and also loads on low VRAM.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment