not run
#1
by
rakmik
- opened
https://github.com/kim90000/llama-3-8b-instruct-scb-gptq-2bit/blob/main/Untitled51.ipynb
You need to add your use_auth_token=YOUR_READ_TOKEN
while loading the model since it is a private repo. Please check here.
Please also note that the performance of INT2 model sucks in practice! We recommend you to use at least INT4 models!
Please check https://huggingface.co/shuyuej/Llama-3.3-70B-Instruct-GPTQ.
Please let us know if you have any other issues!
shuyuej
changed discussion status to
closed
The model you are referring to is larger than 40 GB and I only have 16 GB vram
I would like to run 70-bit models in 2bit
thank you
vllm
not run in win10
I would like to run 70-bit models in 2bit
any code python to run llama 70b or qwen 72b in 2bit to run with gpu 16 gb or colab t4 free