Context Length

#2
by vacekj - opened

Is there a way to increase the context length for this model to 128k, like the unsloth quants? I am using ik_llama.

I haven't tried it myself, but I'd suggest trying:

llama-server ... --rope-scaling yarn --rope-scale 4 --yarn-orig-ctx 32768

This is from the original Qwen modelcard here: https://huggingface.co/Qwen/Qwen3-30B-A3B#processing-long-texts which has other info. Keep in mind the model card has warnings about this potentially negatively impacting performance as well for shorter context lengths.

I don't think there is anything special about the unsloth quant except maybe they added these parameters by default into the GGUF kv metadata and possibly used a different strategy for imatrix calibration though I've not seen the methodology documented in a repeatable way myself. I would love to see those details if anyone has a link!

If this doesn't work on ik's fork might be possible to just pass in some --override-kv overrides to achieve the same result. Let me know if you figure it out, otherwise I might look into it eventually. Cheers!

Sign up or log in to comment