YaRN not enabled correctly

by CISCai - opened 10 days ago

10 days ago

First off the GGUFs are missing all the YaRN metadata, but in addition to that there's something not quite right with the context lengths, the original model's context length is 40960, not 32768 and as such a scaling factor of 4.0 should then yield 163840, not 131072.

Thireus

5 days ago

•

edited 5 days ago

I've been wondering the same as I've seen 40960 in Qwen's original model config file. Yet, this is what Qwen's team states: "Context Length: 32,768 natively and 131,072 tokens with YaRN." - https://huggingface.co/Qwen/Qwen3-235B-A22B#processing-long-texts

So, not sure what's going on here.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment