unsloth/Qwen3-0.6B-GGUF · Any chance of a 128k version so we can use it as a draft model for the larger 128k models?

smcleod

3 days ago

Thanks!

SamuraiBarbi

2 days ago

I was literally just looking for a 128k quants from unsloth and was sat here scratching my head like, where is it?

Thireus

1 day ago

•

edited 1 day ago

Looking for it as well, but also wondering if a 128k version is really necessary...

I'm using Qwen3-32B-128K-Q8_0.gguf with context size of 131072.

--model-draft Qwen3-0.6B-Q8_0.gguf
--draft-max 8
--draft-min 0
--ctx-size-draft 32768
--draft-p-min 0.5
--gpu-layers-draft 65
--override-kv tokenizer.ggml.bos_token_id=int:151643
--device-draft CUDA0

I have not played with these params yet (they are not optimum), so they are far from optimum and using Q8 instead of Q4 is certainly not a good idea here.

Along with YaRN:

--rope-scaling yarn
--rope-scale 4
--yarn-orig-ctx 32768

shimmyshimmer

Unsloth AI org 1 day ago

Hey guys, as much as we'd love to release 128K quants, the small Qwen3 models don't support 128K context so only the large ones work :)

CC: @smcleod @SamuraiBarbi @Thireus @siddhesh22

SamuraiBarbi

1 day ago

•

edited 1 day ago

Hey guys, as much as we'd love to release 128K quants, the small Qwen3 models don't support 128K context so only the large ones work :)

CC: @smcleod @SamuraiBarbi @Thireus @siddhesh22

I see, I see! Thank you for clarifying, makes sense now :) Appreciate you taking the time to respond to our inquiry!