not run

by rakmik - opened Mar 15

rakmik

Mar 15

Owner Mar 19

https://github.com/kim90000/llama-3-8b-instruct-scb-gptq-2bit/blob/main/Untitled51.ipynb

You need to add your use_auth_token=YOUR_READ_TOKEN while loading the model since it is a private repo. Please check here.

Please also note that the performance of INT2 model sucks in practice! We recommend you to use at least INT4 models!
Please check https://huggingface.co/shuyuej/Llama-3.3-70B-Instruct-GPTQ.

Please let us know if you have any other issues!

shuyuej changed discussion status to closed Mar 19

rakmik

Mar 19

The model you are referring to is larger than 40 GB and I only have 16 GB vram

rakmik

Mar 19

I would like to run 70-bit models in 2bit

rakmik

Mar 19

thank you

rakmik

Mar 19

vllm
not run in win10

rakmik

Mar 19

I would like to run 70-bit models in 2bit

any code python to run llama 70b or qwen 72b in 2bit to run with gpu 16 gb or colab t4 free

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment