Q2_K or similar

#7
by K17 - opened

Would it be possible to upload Q2_K quants, or describe way those quants were created so I can quantize them without bothering you further?

I'm struggling with loading both HI and LO noise models to 24GB VRAM at once. With Wan2.2, I found out that combination of Q2_K hi + Q6 lo model produces really good looking image, but even smallest quants in this repo cause OOM when loaded together.

I've also tried following link from README, but there seems to be no quantization script and its readme suggests using llama-quantize, which seems to consistently crash with some variation of:

tensor 'patch_embedding.weight' has invalid number of dimensions: 5 > 4.

Sorry, I’m not working on this at the moment.

You can find the full instructions for using the quantization tools here: https://github.com/city96/ComfyUI-GGUF/tree/main/tools#readme

– More details on llama-quantize:
https://github.com/city96/ComfyUI-GGUF/tree/main/tools#quantizing-using-custom-llamacpp

Where you'll need to

  • clone the llama.cpp
  • check out the correct branch
  • apply lcpp.patch
  • compile and build the llama-quantize binary file

Sign up or log in to comment