8GB GPU can run this,10t/s

#41
by wqerrewetw - opened

https://huggingface.co/mradermacher/QwQ-32B-i1-GGUF

llama-server.exe -m QwQ-32B.i1-IQ2_XXS.gguf -ngl 60 -fa -ctk q8_0 -ctv  q8_0  --temp 0.6 --top-p 0.95 --top-k 30 -c 2048 -n -1  --host 0.0.0.0 --port 8080 --reasoning-format deepseek 

speed is about 10t/s

image.png

I guess this is Nvidia using Cuda, right? Certainly not Vulkan, because I have 8GB AMD GPU using Vulkan and I'm getting about 2 t/s at best with Q2_K. I see you're running imatrix version, that's a whole story in itself for me personally they never work well - they never utilize GPU at all and they are slower (probably due to lack of GPU offloading to begin with).

Sign up or log in to comment