Spaces:
Running
Will this method of quantization appear in Ollama?
Hello!
I would really like to use models quantized to this state on my (very weak) computer.
Can you tell me if this method will be available in Ollama?
Hi Regin,
Thanks for your advice!
I am very interested in adapting this method for use with Ollama.
When using VPTQ or similar methods, what are your most important requirements?
What device are you using? This information would help us pinpoint the motivation for integrating it into Ollama.
The backend of Ollama is llama.cpp, and we could support VPTQ in llama.cpp as the first step.
See, I'm using a relatively powerful laptop. That said, I only have 8GB of memory. I might build myself a server for LLM, but that's a question for tomorrow.
I need a very small model, at high quality.
I guess my priorities are RAG and programming.
In addition, I would like to train micro-models for my tasks. Is there any possibility to pre-train your quantized models? Something like QLoRA
See, I'm using a relatively powerful laptop. That said, I only have 8GB of memory. I might build myself a server for LLM, but that's a question for tomorrow.
I need a very small model, at high quality.
I guess my priorities are RAG and programming.
In addition, I would like to train micro-models for my tasks. Is there any possibility to pre-train your quantized models? Something like QLoRA
As the VPTQ maintainer mentioned, they will release quantization codes https://github.com/microsoft/VPTQ/issues/29, and I guess you can quantify your pre-trained model.
Or Integrate with QLoRA after quantization.
Thanks!
So it will be possible to run and pre-train quantized models?
So it will be possible to run and pre-train quantized models?
Yes, please wait a few weeks.