Any chance of a 128k version so we can use it as a draft model for the larger 128k models?
Thanks!
I was literally just looking for a 128k quants from unsloth and was sat here scratching my head like, where is it?
Looking for it as well, but also wondering if a 128k version is really necessary...
I'm using Qwen3-32B-128K-Q8_0.gguf with context size of 131072.
--model-draft Qwen3-0.6B-Q8_0.gguf
--draft-max 8
--draft-min 0
--ctx-size-draft 32768
--draft-p-min 0.5
--gpu-layers-draft 65
--override-kv tokenizer.ggml.bos_token_id=int:151643
--device-draft CUDA0
I have not played with these params yet (they are not optimum), so they are far from optimum and using Q8 instead of Q4 is certainly not a good idea here.
Along with YaRN:
--rope-scaling yarn
--rope-scale 4
--yarn-orig-ctx 32768
Hey guys, as much as we'd love to release 128K quants, the small Qwen3 models don't support 128K context so only the large ones work :)
Hey guys, as much as we'd love to release 128K quants, the small Qwen3 models don't support 128K context so only the large ones work :)
I see, I see! Thank you for clarifying, makes sense now :) Appreciate you taking the time to respond to our inquiry!