not run

#1
by rakmik - opened

https://github.com/kim90000/llama-3-8b-instruct-scb-gptq-2bit/blob/main/Untitled51.ipynb

You need to add your use_auth_token=YOUR_READ_TOKEN while loading the model since it is a private repo. Please check here.

Please also note that the performance of INT2 model sucks in practice! We recommend you to use at least INT4 models!
Please check https://huggingface.co/shuyuej/Llama-3.3-70B-Instruct-GPTQ.

Please let us know if you have any other issues!

shuyuej changed discussion status to closed

The model you are referring to is larger than 40 GB and I only have 16 GB vram

I would like to run 70-bit models in 2bit

thank you

vllm
not run in win10

I would like to run 70-bit models in 2bit

any code python to run llama 70b or qwen 72b in 2bit to run with gpu 16 gb or colab t4 free

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment