Yarn quantization for long context
#1
by
sovetboga
- opened
Is a separate quantization model with prescribed settings needed for Yarn so that the context is 128k, or is there no such situation in GGUF and it immediately sets the settings? And if so, will there be a model with 128k for GGUF?
It should be able to set the rope on its own or with runtime settings