YaRN not enabled correctly
#3
by
CISCai
- opened
First off the GGUFs are missing all the YaRN metadata, but in addition to that there's something not quite right with the context lengths, the original model's context length is 40960, not 32768 and as such a scaling factor of 4.0 should then yield 163840, not 131072.
I've been wondering the same as I've seen 40960 in Qwen's original model config file. Yet, this is what Qwen's team states: "Context Length: 32,768 natively and 131,072 tokens with YaRN." - https://huggingface.co/Qwen/Qwen3-235B-A22B#processing-long-texts
So, not sure what's going on here.