8GB GPU can run this,10t/s
#41
by
wqerrewetw
- opened
https://huggingface.co/mradermacher/QwQ-32B-i1-GGUF
llama-server.exe -m QwQ-32B.i1-IQ2_XXS.gguf -ngl 60 -fa -ctk q8_0 -ctv q8_0 --temp 0.6 --top-p 0.95 --top-k 30 -c 2048 -n -1 --host 0.0.0.0 --port 8080 --reasoning-format deepseek
speed is about 10t/s
I guess this is Nvidia using Cuda, right? Certainly not Vulkan, because I have 8GB AMD GPU using Vulkan and I'm getting about 2 t/s at best with Q2_K. I see you're running imatrix version, that's a whole story in itself for me personally they never work well - they never utilize GPU at all and they are slower (probably due to lack of GPU offloading to begin with).