Yarn quantization for long context

#1
by sovetboga - opened

Is a separate quantization model with prescribed settings needed for Yarn so that the context is 128k, or is there no such situation in GGUF and it immediately sets the settings? And if so, will there be a model with 128k for GGUF?

It should be able to set the rope on its own or with runtime settings

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment