Context Length
Is there a way to increase the context length for this model to 128k, like the unsloth quants? I am using ik_llama.
I haven't tried it myself, but I'd suggest trying:
llama-server ... --rope-scaling yarn --rope-scale 4 --yarn-orig-ctx 32768
This is from the original Qwen modelcard here: https://huggingface.co/Qwen/Qwen3-30B-A3B#processing-long-texts which has other info. Keep in mind the model card has warnings about this potentially negatively impacting performance as well for shorter context lengths.
I don't think there is anything special about the unsloth quant except maybe they added these parameters by default into the GGUF kv metadata and possibly used a different strategy for imatrix calibration though I've not seen the methodology documented in a repeatable way myself. I would love to see those details if anyone has a link!
If this doesn't work on ik's fork might be possible to just pass in some --override-kv
overrides to achieve the same result. Let me know if you figure it out, otherwise I might look into it eventually. Cheers!