CPU Usage
Goodday, I'm trying to use this model on a CPU only device. I'm not to worried about the inference time, my main issue is that we don't have the GPU resources. I have tried running this model, but it requires bitsandbytes and CUDA.
I have checked that QWEN2-VL runs perfectly on my device, so is this some configuration that I can;t change due to the way you finetuned it using unsloth. Not very familiar with unsloth.
Thank you
@reganshen , We can try to work on a solution. I will keep you updated.
Thank you for the response @oddadmix , I did some further research, and I think the main reason as to why its not able to run on CPU is due to the fact that you finetuned on the quantized Qwen Model "unsloth/Qwen2-VL-2B-Instruct-unsloth-bnb-4bit" And due to this being quantized using bitsandbytes it is not compatible with CPU, as even this base unsloth QWEN model is not CPU compatible. The only possible solution I see to get this working for CPU would to try and set your models torch_dtype to 16 before saving, but I did try do this manually and it didnt work, so it might be the underlying unsloth QWEN model that is the issue.
So I believe that it might not be possible to change this current model to CPU compatible due to the base QWEN model from unsloth being the issue.
BUT please let me know if you come up of any solution.
Thanks
can you try this one
https://huggingface.co/oddadmix/Qari-OCR-0.2.2.1-VL-2B-Instruct-merged
This should be a merged model. Please try it and let me know if that works.
can you try this one
https://huggingface.co/oddadmix/Qari-OCR-0.2.2.1-VL-2B-Instruct-mergedThis should be a merged model. Please try it and let me know if that works.
Thank you so much, its running on my Device.